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(sf) T-DNA promoters of the Ri plasmld. 

@ The sequence of the T L -DNA of Ri plasmids found in 
Agrobactehum rhizogenes strains HRI and A4 is disclosed. 
Sixteen open reading frames bounded by eukaryotic promo- 
ters, ribosome binding sites, and polyadenylation sites were 
found, five of which were observed to be transcripted in a 
developmentally and phenotypically regulated manner. The 
use of promoters and polyadenylation sites from pRi T L -DNA 
to control expression of heterologous foreign structural 
genes is taught; using as examples the structural genes for 
Phaseolus vulgaris storage protein (phaseolin), P. vulgaris 
lectin, a sweet protein (thaumatin), and Bacillus thuringiensis 
crystal protein. Vectors useful for manipulation of sequences 
^ of the structural genes and T-DNA are also provided. 
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TITLE MODIFIED 

SGe ^° nt Pa^a R1 T-DNA PROMOTERS 

FIELO 

The present Invention Is in the fields of genetic engineering and 
plant husbandry, and especially provides means for promotion of transcrip- 
tion in plant. 

BACKGROUND 

Following are publications which disclose background information 
related to the present invention. These publications are discussed in 
greater depth in the Background sections indicated. Restriction maps of 
Ri plasmids are disclosed by G. A. Huffman et__al_. (1984) J. Bacterid. 
157:269-276; L. Jouanin (1984) Plasmid 12:91-102; and H. Pomponi et _al_. 
(1983) Plasmid _10:119-129 (see TIP Plasmid DNA). L. Herrera-Estrella 
et al_. (1983) Nature 303:209-213, provides examples of use of the nos 
promoter to drive expression in plants of heterologous foreign structural 
genes. N. Mural eb al. (1983) Science 222:476-482, reported the ocs pro- 
moter could drive expression of an intron-containing fusion gene having 
foreign coding sequences. (Manipulations of the TIP Plasmids). R. F. 
Barker et al_. (1983) Plant Molec. Biol. 2.:335-350, and R. F. Barker and 
J. D. Kemp, U.S. Patent application ser. no. 553,786 disclose the complete 
sequence of the T-DNA from the octopine-type plasmid pTil5955; homologous 
published sequences of other Ti plasmid genes are referenced therein. 
Barker and Kemp also taught use of various octopine T-DNA promoters to 
drive expression in plants of various structural genes (Genes on the TIP 
Plasmids). 

Shuttle Vectors 

Shuttle vectors, developed by G. B. Ruvkun and F. M. Ausubel (1981) 
Nature 289 :85-88, which provide means for inserting foreign genetic 
material into large DNA molecules, include copies of recipient genome DNA 
sequences into which the foreign genetic material is inserted. Shuttle 
vectors can bo introduced a recipient cell, by well known methods, inclu- 
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ding the tr1-parental mating technique (Ruvkln and Ausubel , supra ), direct 
transfer of a self-mobllizable vector 1n a bi-parental mating, direct 
uptake of exogenous DNA bv Agrobacterlum cells ("transformation"), sphero- 
plast fusion of Agrobacterlum with another bacterial cell, uptake of lipo- 
5 some-encapuslated DNA* After a shuttle vector is Introduced Into a 
recipient cell, possible events include a double cross-over with one 
recombinational event on either side of the marker (homogenotization). 
Phenotyplcally dominant traits may be Introduced by single cross-over 
events (colntegration) (A. Caplan et al_. (1983) Science 222 :815-821 ; R. B. 
10 Horsch et aK (1984) Science 223:496-498); one must guard against deletion 
of the resulting tandem duplication. Shuttle vectors have proved useful 
in manipulation of Aorobacterium plasmids. 

"Suicide Vectors" (e.g. R. Simon et jal^. (1983) Biotechnol . Jj784- 
791), are shuttle vectors having replicons not independently maintainable 

15 within the recipient cell. Use of suicide vectors to transfer DNA 

sequences into a Ti plasmid has been reported (e.g. E. Van Haute et al . 
(1983) EMBO J. 1:411-417; L. Comai et al. (1983) Plasmid r0:21-30; 
P. Zambryski et al. (1983) EMBO J. 1:2143-2150; P. Zambryski et al.. (1984) 
in Genetic Engineering, Principles, and Methods , 6, eds: A. Hollaender 

20 and J. Setlow; P. Zahn et_.al_. (1984) Mol. Gen. Genet. i94_:188-194; and 
Caplan et al., supra ; and C. H. Shawet al- (1983) Gene 28:315-330. 

Overview of Agrobacterium 

Included within the gram-negative genus Agrobacterium are the species 

25 A. tumefaciens and A,, rhizogenes , respectively the causal agents of crown 
gall disease and hairy root disease of gymnosperm and dicotyledonous 
angiosperm plants. In both diseases, the inappropriately growing plant 
tisssue usually produces one or more amino acid derivatives known as 
opines which may be classified into families whose type members include 

30 'octopine, nopaline, mannopine, and agropine. 

Virulent strains of Agrobacterium harbor large plasmids known as Ti 
(tumor-inducing) plasmids (pTi) in A. tumefaciens and Ri (root-inducing) 
plasmids 1n A. rhizogenes (pRi), often classified by the opine which they 
caused to be synthesized. Ti and Ri plasmids both contain DNA sequences, 
35 referred to as T-DNA (transf erred-DNA) , which in tumors are found to be 
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Integrated Into the genome of the host plant. Several T-DNA genes are 
under control of T-DNA promoters which resembles the canonical eukaryotlc 
promoter 1n structure. The T1 plasmid also carries genes outside the 
T-DNA region. The set of genes and DNA sequences responsible for trans- 
5 forming the plant cell are hereinafter collectively referred to as the 
transformation-Inducing principle (TIP). The term TIP therefore Includes, 
but 1s not limited to, both Ti and Ri plasmlds. 

General reviews of Agrobacterium- caused disease Include those by 
D. J. Herlo (1982), Adv. Plant Pathol. 1:139-178; L. W. Ream and M. P. 

0 Gordon (1982), Science 218:854-859; M. W. Sevan and M.-D. Chilton (1982), 
Ann. Rev, Genet. J£:357-384; G. Kahl and J. Schell (1982) Molecular 
Biology of Plant Tumors ; K. A. Barton and H.-D. Chilton (1983) Meth. 
Enzymol. 101 :527-539; A. Depicker jrt _aK (1983) in Genetic Engineering of 
Plants: an Agricultural Perspective , eds: T. Kosuge et aK, pp. 143-176; 

5 A. Caplan et tY. (1983) Science 222 :815-821 ; T. C. Hall^etjak, European 
Patent application 126,546; and A. N. Binns (1984) Oxford Surveys Plant 
Mol. Cell Biol. J_:130-160. A number of more specialized reviews can be 
found in A. P"ohler f ed. (1983) Molecular Genetics of the Bacteria-Plant 
Interaction , including a treatment by D. Tepfer of _A. r'hizogenes- mediated 

0 transformation (pp. 248-258). R. A. Schilperoort (1984) in Efficiency in 
Plant Breeding (Proc. 10th Congr. Eur. Assoc. Res. Plant Breeding), eds: 
W. Lange et_ aK , pp. 251^285, discusses the Agrobacterium -based plant 
transformation in the context of the art of plant genetic engineering and 
plant improvement. 

.5 

Infection of Plant Tissues 

Plant cells can be transformed by Agrobacterium by several methods 
known to the art. For a review of recent work, see K.Syono (1984) Oxford 
Surveys Plant Mol. Cell Biol. Jh217-219. In the present invention, any 
; 0 method will suffice as long as the gene is stably transmitted through 
mitosis and meiosis. 

The infection of plant tissue by Agrobacterium is a simple technique 
well known to those skilled in the art. Typically after being wounded, a 
plant is inoculated with a suspension of tumor-inducing bacteria. Alter- 
15 natively, tissue pieces are inoculated, e.g. leaf disks (R. B. Horsch 



JDOCIO: <EP 02045 90 A2_l_> 



- 4 - 



0204590 



et al. (1985) Science 227:1229-1231) or Inverted stem segments (K, A. 
Barton et eh (1983) Cell 32:1033-1043). After Induction, the tumors can 
be placed 1n tissue culture on media lacking phytohormones usually 
Included for culture of untransformed plant tissue. Traditional Inocula- 
tion and culture techniques may be modified for use of disarmed T-DNA 
vectors Incapable of Inducing hormone Independent growth (e.g. see 
P. Zambryskl et sK (1984) in Genetic E ngineering. Principles, and 
Methods , 6., eds.: A. Hollaender and J. Setlow). 

Aorobacterlum is also capable of infecting isolated cells, cells 
grown 1n culture, callus cells, and isolated protoplasts (e.g. R. B. 
Horsch and R. T. Fraley (1983) in Advances in Gene Technology: Molecular 
Genetics of Plants and Animals (Miami Winter Symposium 20) , eds.: 
K. Downey et a].., P- 576; R. T. Fraley etal- (1984) Plant Mol. Biol. 
3:371-378; R. T. Fraley and R. B. Horsch (1983) in Genetic Engineering of 
Plants: an Agricultural Perspective , eds.: T. Kosuge et _a_l- • PP- 177_ 
194- A. Kuller et al_. (1983) Biochem. Biophys. Res. Comm. _123_:4 58-462). 
The transformation frequency of inoculated callus pieces can be Increased 
by addition of an opine or opine precursors (L. M. Cello and W. L. Olsen, 
U.S. Patent 4,459,355). 

Plant protoplasts can be transformed by the direct uptake of TIP DNA 
in the presence of a polycation, polyethelene glycol, or both (e.g. F. A. 
Krens et al_. (1982) Nature 296:72-74) , though integrated Ti plasmid may . 
Include non-T-DNA sequences. 

An alternative method involves uptake of DNA surrounded by mem- 
branes. pTi-DNA may be introduced \n± liposomes or by fusion of plant and 
bacterial cells after removal of their respective cell walls (e.g. R. Hain 
et al. (1984) Plant Cell Rept. 2:60-64). Plant protoplasts can take up 
cell wall delimited ftgrobacterium cells. T-DNA can be transmitted to 
tissue regenerated from fused protoplasts. 

The host range of crown gall pathogenesis may be influenced by T-DNA- 
encoded functions such as onc_ genes (A. Hoekema et_ ah (1984) 
J. Bacterid. 158:383-385; A. Hoekema etji- (1964) EKBO J. 2:3043-3047 ; 
W. C. Buchholz and K. F. Thomasshow (1984) 160:327-332). R. L. Ausich, 
European Patent Application 108,580, reports transfer of T-DNA from 
A. tumefaciens to green algal cells, and expression therein of octopine 
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synthase and Tn5 kanan\ycin resistance genes.* 6. H. S. Hooykaas- 
van Slogteren _et (1984) Nature 311 :763-764 1 and O.-P. Hernalsteens 
et al. (1984) EMBO J. _3 :3039 " 3041 » hflve demonstrated transformation of 
monocot cells by Agrobacterlum without the customary tumorl genesis. 

3 Regeneration of Plants 

Differentiated plant tissues with normal morphology have been 
obtained from crown gall tumors. For example, L» Otten Jit^^ah (1981) 
Molec Gen. Genet. 183 :209*213. used tms (shoot-Inducing, root-suppressing) 
T1 plasmid mutants to create tumors which proliferated shoots that formed 

) self-fertile flowers. The resultant seeds germinated into plants which 
contained T-DNA and made opines. The tnrc phenotype can be partly overcome 
by washing of the rooting area and can be bypassed by grafting onto a 
normal stock (A. Wostemeyer et al . .(1984) Mol. Gen. Genet. 194:500-507), 
Similar experiments with a tmr (root-inducing, shoot-suppressing) mutant 

0 showed that full-length T-DNA could be transmitted through meiosis to 
progeny and that in those progeny nopaline genes could be expressed, 
though at variable levels (K. A. Barton et_ jaK (1983) Cell 32:1033- 
1043). 

Genes involved in opine anabolism were capable of passing through 
3 meiosis, though the plants were male sterile if the T-DNA was not dis- 
armed. Seemingly unaltered T-DNA and functional foreign genes can be 
inherited in a dominant, closely linked, Hendelian fashion. Genetically, 
T-DNA genes are closely linked in regenerated plants (A. Wostemeyer et al . 
(1984) Hoi. Gen. Genet. 194:500-507; R. B. Horsch et al_. (1984) Science 
5 223.:496-498; D. Tepfer (1984) Cell 37:959-967). 

The epigenetic state of the plant cells initially transformed can 
affect regeneration potential (<J. M. S. van Slogteren jBt_al_. (1983) Plant 
Mol. Biol. 2_:321-333). 

Roots resulting from transformation f rom A^. rhizogenes have proven 
0 relatively easy to regenerate directly into plantlets (M.-D. Chilton 

"etjaK (1982) Nature 295 :432-434 ; D. Tepfer (1984) Cell j[7:959-967; Tepfer 

(1983) in Puhler, supra ), and are easily cloned. Regenerabil ity from 
transformed roots may be dependent on T-DNA copy-number (C. David et al. 

(1984) Biotechnol. £:73-76). Hairy root regenerants have a rhizogenic 
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potential and Isozyme pattern not found 1n untransformed plants 

(P. Costantlno et ak (1984) J. Mol. Appl . Genet. 1:465-470). The pheno- 

type of these plants 1s generally altered, although not necessarily dele- 

teriously. 

5 Genes on the TIP Plasmlds 

The complete sequence of the T-DNA of an octopine-type plasmld found 
in ATCC 15955, pT1l5955, has been reported (R. F. Barker et^ak (1983) 
Plant Ho Tec. Biol. 2;335-350), as has that of the T L region of pTiAchS 
(J. Gielen et ak (1984) EKBO J. 2:835-846)* Published T-DNA genes do not 
10 contain introns and do have sequences that resemble canonical eukaryotic 
promoter elements and polyadenyl atlon sites. 

T1 plasmids having mutations in the genes tms , tmr , tml , and ocs 
respectively incite tumorous calli of Nicoti2na tabacum which generate 
shoots, proliferate roots, are larger than normal, and do not synthesize 
15 octopine; all but ocs are one (oncogenicity) genes. In other hosts, 

mutants of these genes can induce different phenotypes (see M. W. Bevan 
and M.-D. Chilton (1982) Ann. Rev. Genet. 16_:357-384). Mutations in T-DNA 
genes do not seem to affect the insertion of T-DNA into the plant genome 
(J. Leemans et ak (1982) EMBO J. 1:147-152; L. W. Ream et aK (1983) 

20 Proc. Natl. Acad. Sci. USA _80:1660-1664). 

Octopine Ti plasmids carry an ocs, gene which encodes octopine syn- * 
thase (lysopine dehydrogenase). All upstream signals necessary for 
expression of the ocs gene are found within 295 bp of the ocs transcrip- 
tional start site (C. Koncz et _ak (1983) EMBO J. 1:1597-1603). P. Dhaese 

25 Jt ak (1983) EMBO 0. 2/419-426, reported the utilization of various poly- 
adenylation sites by "transcript 7" (0RF3 of Barker et _ak , supra ) and 
ocs. The presence of the enzyme octopine synthase within a tissue can 
protect that tissue from the toxic effect of various amino acid analogs 
(6. A. Dahl and J. Tempt (1983) Theor. Appl. Genet. 66_:233-239; M. G. 

30 Koziel et ak (1984) J. Mol. Appl. Genet. _2:549-562) . 

Nopaline Ti plasmids encode the nopaline synthase gene ( nos) 
(sequenced by A. Depicker et _ak (1982) J. Mol. Appl. Genet. 2:561-573). 
The "CAAT" box, but not upstream sequences therefrom, is required for 
wild-type levels of nos expression; a partial or complete "TATA" box 
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supports very low level nos activity (C. H. Shaw et ah (1984) Nucl. Adds 
Res. 12:7831-7846). Genes equivalent to tms_ and tmr have been Identified 
on a nopal 1ne-type plasmld and a number of transcripts have been mapped 
(L. Willmitzer et ah (1983) Cell 32:1045-1056). 

5 Transcription from hairy root T-DNA has also been detected 

(I. Willmitzer et ah (1982) Moh Gen. Genet. 186:16-22). R1 plasmids and 
tins" Ti plasmids can complement each other when Inoculated onto plants, 
resulting in calli capable of hormone-independent growth (G. M. S. 
van Slogteren (1983) Ph.D. thesis, Ri jksuniversiteit te Leiden, 
10 Netherlands). 

TIP plasmid genes outside of the T-DNA region include the vir genes, 
which when mutated result in an avirulent Ti plasmid. Several vir genes' 
have been accurately mapped and have been found to be located in regions 
conserved among various Ti plasmids' (V. N. Iyer et ^h (1982) Moh Gen. 

15 Genet. .188:418-424). The _vir genes function in trans , being capable of 
causing the transformation of plant cells with T-DNA of a different 
plasmid type and physically located on another plasmid (e.g. A. J. 
de Framond et ah (1983) Blotechnoh J_:262-269; A. Hoekema et ^h (1983) 
Nature 303:179-180; J. Mile et _ah (1984) J. Bacterid. 158:754-756; 

10 A. Hoekema et .ah (1984) J. Bacterid. 158:383-385); such arrangements are 
known as binary systems. Chilton et ah (18 January 1983) 15th Miami 
Winter Symp. , described a "micro-Ti" plasmid made by resectioning the 
"mini-Ti" of de Framond et _ah , supra (see European Patent application 
126 ,546 for a description). G. A. Dahl et ah, U.S. Patent application 

!5 ser. no. 532,280, and A. Hoekema (1985) Ph.D. Thesis, Ri jksuniversiteit te 
Leiden, The Netherlands, disclose micro-Ti plasmids carrying ocs genes 
constructed from pTil5955. M. Bevan (1984) Nucl. Acids Res. 1278711-8721, 
discloses a kanafnycin-resistant micro-Ti. T-DNA need not be on a plasmid* 
to transform a plant cell; chromosomal ly located T-DNA is functional 

0 (A. Hoekema et ah (1984) EMBO 0-2:2485-2490). Ti plasmid-determined 
characteristics have been reviewed by Merlo, supra (see especially 
.Table II therein), and Ream and Gordon, supra. 
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TIP Plasmid DNA 

R1 plasmlds have been shown to have extensive homology among them- 
selves (P. Costantlno et t\. (1981) Plasmid 5_:170-182) , and to both octo- 
pine (F. F. White and E. W. Nester (1980) J. Bacterioh _144_:710-720) and 

5 nopaline (G. Risuleo et ah (1982) Plasmld 2:45-51) T1 plasmlds, primarily 
1n regions encoding v1r genes, replication functions, and opine metabolism 
functions (L. Jouanln (1984) Plasmid _12:91-102 ; K. Lahners et ah (1984) 
Plasmid _U:130-140; E. E. Hoodet^ah (1984) Biotechnol . £ :702-709; 
F. Leach (1983) Ph.D. Thesis, Universite de Par1s-Sud, Centre d'Orsay, 

10 France); none of the homologies are in pR1 T L -DNA. pRi T-DNA contains 
extensive though weak homologies to T-DNA from both types of T1 plasmid 
(L. Willmitzer et ah (1982) Mol . Gen, Genet. J86:16-22). DNA from 
several plant species contains sequences, referred to as cT-ONA (cellular 
T-DNA), having homology with the Ri plasmid (F. F. White ^etjsh (1983) 

15 Nature 301:348-350, L. Spano et ah (1982) Plant Hoi ec. Biol. 1:291-300; 
D. Tepfer (1982) in 2e Collogue sur les Recherches Fruitieres Bordeaux , 
pp. 47-59). G. A. Huffman et ah (1984) J. Bacterioh 157:269-276 and 
Jouanin, supra , and Leach, supra , have shown that, in the region of cross- 
hybridization, the Ri plasmid pRiA4 b is more closely related to a pT1A6 

20 (octopine-type) than pTiT37 (nopal ine-type) and that this Ri plasmid 

appears to carry sequence homologous to tms but not tmr . Their results 
also suggested that Ri T-DNA may be discontinuous, analogous to the case 
with octopine T-DNA (see below). The restriction maps of pRiA4 b> pRil855, 
and pRiHRI were respectively disclosed by Huf fman _et ah , supra , 

25 M. Pomponi et_ah (1983) Plasmid JJD:119-129 , and L. Jouanin supra . Ri 
plasmids are often characterizable as being agropine-type or mannopine- 
type (A. Petit et .ah (1983) Moh Gen. Genet. 190:204-214). 

A portion of the Ti or Ri plasmid is found in the DNA of tumorous 
plant cells. T-DNA may be integrated (i.e. inserted) into host DNA at 

30* multiple sites in the nucleus. Flanking plant DNA may be either repeated 
or low copy number sequences. Integrated T-DNA can be found in either 
direct or inverted tandem arrays and can be separated by spacers. Much 
non-T-DKA Ti plasmid DNA appears to be transferred into the plant cell 
prior to T-DNA integration (H. Joos et_ah (1983) EMBO J. 2;2151-2160). 

35 T-DNA has direct repeats of about 25 base pairs associated with the 



BNSDOCID: <EP__O2O4590A2_L> 



0204590 

- 9 - 



borders, I.e. with the T-DNA/plant DNA junctions, which may be Involved 1n 
either transfer from Agrobacterlum or Integration Into the host genome. 

R1 plasmlds Integrate two separate T-DNAs, T L -DNA and Tr-DNA, left 
and right T-DNAs, respectively. T L (about 15-20 kbp) and T R (about 
5 8-10 kbp) are separated by about 15-20 kbp (Huffman jeljj.. , supra . 

Oouanln, supra ). The region of agroplne-type pRI and Tr integrated can 
vary between Individual plants or species inoculated (F. F. White et ah 

(1983) Nature 301:348-350; D. A. Tepfer (1984) Cell 27:959-967). Though 
T-DNA 1s occasionally deleted after integration in the plant genome, it 1s 

0 generally stable. Tumors containing a mixture of cells which differ in 
T-DNA organization or copy number are the result of multiple transforma- 
tion events. 

The exact location relative to the border repeats of T-DNA/flanking 
plant DNA junctions varies and need not be within a border repeat. Viru- 
lence is not always eliminated after deletion of one of either of the 
usual nopaline T-DNA border sequences (compare H. Jobs et^_a1_. (1983) Cell 
3?:1057-1067 with K. Wang ^et ah (1984) Cell 38:455-462 and C. H Shaw 
et__al_. (1984) Nucl. Acids Res. 12_:6031-6041 , concerning the right 
border). The orientation of the right nopaline border can be reversed 
without total loss of functionality, and a single border sequence is 
capable of transforming closely-linked sequences (M. De Block et al . 

(1984) EMBO J. 2 :1681 * 1689 )* A synthetic 25 bp nopaline right border 
repeat is functional (Wang et al . , supra ) . Circular intermediates 
associated with T-DNA transfer appear to be spliced precisely within the 
25 bp direct repeats (Z. Koukol ikova-Nicola _et _ah (1985) Nature 
313:191-196). 

Manipulations of the TIP Plasmids 

Altered DNA sequences, including deletions, may be inserted into TIP 
30 plasmids (see Shuttle Vectors), Some pTi derivatives can be transferred 
to L coli and mutagenized therein (J, Hi 1 1 e _et^ _al_. (1983) J. Bacteriol. 
•154:693-701). P. Zambryski ei_aK (1983) EMBO J. ,2:2143-2150, report use 
of a vector, deleted for most T-DNA genes to transform tobacco and regen- 
erate morphologically normal plants. 
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The nopaline synthase promoter can drive expression of drug resis- 
tance structural genes useful for selection of transformed plant cells. 
M. W. Bevan etll. (1983) Nature 30^:184-187 ; R. T. Fral ey et aK (1983) 
Proc. Natl. Acad. Sci. USA 80:4803-4807 ; and L. Herrera-Estrella et _ah 
5 (1983) EKBO J. _2 :987-995 t have Inserted the bacterial kanamydn resistance 
structural gene (neomycin phosphotransferase II, NPT2), or Jean, from Tn5 
downstream from (i.e. behind or under control of) the nopal ine synthase 
promoter. The constructions were used to transform plant cells which in 
culture were resistant to kanamydn and its analogs such as neomycin and 
10 G418. Promoters for octoplne T L genes 0RFZ4 and 0RF25 can also drive kan^ 
structural gene expression (J. Velten et _al_. (1984) EMBO 0. .3:2723- 
2730). Herrera-Estrella et^ii* , supra , reported a similar construction, 
in which a methotrexate resistance gene (dihydrofolate reductase, DHFR) 
from Tn7_ was placed behind the nos promoter; transformed plant cells were 
15 resistant to methotrexate. Furthermore, L. Herrera-Estrella et (1983) 
Nature 303 :209-213, have obtained expression in plant cells of enzymatic , 
activity of octopine synthase and chloramphenicol acetyltransf erase by 
placing their structural genes under control of nos promoters. G. Helmer 
et a\_. (1984) Biotechnol. 2^:520-527, have created a fusion gene useful as 
20 a screenable marker having the promoter and 5 ! -end of the no£ structural 
gene fused to E, coli B-gal actosidase (lacZ ) sequences. 

N. Murai et aK (1983) Science 222_:476-482, reported fusion of the 
promoter and the 5' -end of the octopine synthase structural gene to a 
phaseolin structural gene. The encoded fusion protein was produced under 
25 control of the T-DNA promoter. Phaseol in-derived introns underwent proper 
post-transcriptional processing. 



SUMMARY OF THE INVENTION : 
30% One object of this invention is to provide means for promoting the 

expression of structural genes within plant cells wherein said genes are 
.foreign to said cells. In pursuance of this goal, other objects are to 

provide pRi T-DNA promoters and transcript terminators, and especially 

pRi T L -DNA-derived promoters and pRi T L -DNA-deri ved polyadenyl ation sites, 
35 which are DNA sequences capable of controlling structural gene transcrip- 
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Hon and translation within plant cells, and to provide developmental and 
phenotyplc regulation of said foreign structural genes. Another object 1s 
to provide specialized plant tissues and plants having within them pro- 
teins encoded by foreign structural genes and, 1n cases where the protein 
5 Is an enzyme, having or lacking metabolites or chemicals which respec- 
tively are not or are otherwise found 1n the cells 1n which the genes 1s 
Inserted. Other objects and advantages will become evident from the 
following description. 

The invention disclosed herein provides a plant comprising a geneti- 
cally modified plant cell having a foreign structural gene introduced and 
expressed therein under control of pRi T L -DNA-derived plant expressible 
transcription controlling sequences (TxCS). Further, the invention pro- 
vides plant tissue comprising a plant cell whose genome includes T-DNA 
comprising a foreign structural gene inserted in such orientation and 

5 spacing with respect to pRi T L -DNA-deri ved plant-expressible TxCS as to be 
expressible in the plant cell under control of those sequences. Also 
provided are novel strains of bacteria containing and replicating T-DNA, 
the T-DNA being modified to contain an inserted foreign structural gene in 
such orientation and spacing with respect to a T-DNA-deri ved, plant- 

0 expressible TxCS as to be expressible in a plant cell under control of 

said TxCS. Additionally, the invention provides novel vectors haying the 
ability to replicate in E_. col i and comprising T-DNA, and further com- 
prising a foreign structural gene inserted within T-DNA contained within 
the vector, in such manner as to be expressible in a plant cell under 

5 control of a pRi T^-DNA TxCS. Furthermore, strains of bacteria harboring 
said vectors are disclosed. 

Much is known about the location, size, and function of many tran- 
scripts activated when _A. tumefaciens T-DNA regions are transferred into 
Q the genome of plants (see Background). Most pTi T-DNA T L -DNA open reading 
frames (ORFs) correlate with known gene products. However, until the 
disclosure of the present invention, the art knew little about the number, 
■size, and function of genes activated when the T[_-DNA regions from 
A' rhizogenes plasmids, such as pRi A4 , are transferred into a plant 
5 genome. Agropine synthase, tms- 1 and tms- 2 genes have been identified by 
homology with pTi T-DNA in Ri plasmids, but these loci are located in 
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pR1 T R -DNA (G. A. Huf fman et a1_. (1984) J. Bacteriol. 157:269-276: 
L. Jouanln (1984) Plasmld ^12:91-102). The experimental work presented 
herein 1s believed to be the first disclosure of a pRl T L -DNA sequence or 
of any sequence homologous thereto. The availability of this sequence 
5 will enable and otherwise facilitate work 1n the art of plant transforma- 
tion to express foreign structual genes and to engage In other manipula- 
tions of pRi T L -DHA and pRi T L -DNA-deri ved sequences. Without the newly 
disclosed pRi T L -DNA sequence, those of ordinary skill 1n the art would be 
unable to use promoters and polyadenylation sites contained therein to 

10 promote transcription and translation 1n plant cells of foreign structural 
genes. The disclosed sequence reveals the existence of previously unknown 
T-DNA ORFs and associated transcription controlling sequences, and makes 
possible construction of recombinant DNA molecules using promoters and 
polyadenylation sites from pRi T L -DNA genes whose sequences were hitherto 

15 unknown and unavailable to the public. The work presented herein is also 
believed to be the first disclosure of developmental and phenotypic regu- 
lation of T-DNA genes. Results newly disclosed herein will allow those of 
ordinary skill in the art to use T-DNA transcription controlling sequences 
which are so regulated to express heterologous foreign structural genes in 

20 transformed plants. T-DNA genes known to the art before the present dis- 
closure are not known to be so regulated. Furthermore, knowledge of 
pRi T L -DNA sequence enables one to bring to utility promoters and poly- 
adenylation sites that are presently unrecognized; in the future, should a 
new pRi T L -DNA transcript be discovered and mapped, the sequence disclosed 

25 herein will permit associated TxCSs to be combined with heterologous 
foreign structural genes. 

The present invention comprises foreign structural genes under con- 
trol of pRi T|_-DNA promoters expressible in plant cells, the promoter/gene 
combination being inserted into a plant cell by any means known to the 

30 ■ art. More specifically, jn its preferred embodiment the invention dis- 
closed herein comprises expression in plant cells of foreign structural 
genes under control of certain pRi T L -DNA-deri ved plant expressible TxCSs, 
after introduction via T-DNA, that is to say, by inserting the foreign 
structural gene into T-DNA under control of a pRi T L -DNA promoter and/or 

35 ahead of a pRi T^-DNA polyadenylation site and introducing the T-DNA con- 
taining the TxCS/structural gene combination into a plant cell using known 



6NS0OCID: <EP_0204590A2J_) 



0204590 



- 13 - 

means. Once plant cells transformed to contain a foreign structural gene 
expressible under control of a pR1 T L -DNA TxCS are obtained, plant tissues 
and whole plants can be regenerated therefrom using methods and techniques 
well known 1n the art. The regenerated plants are then reproduced by 
conventional means and the Introduced genes can be transferred to other 
strains and cultlvars by conventional plant breeding techniques. The 
Invention in principle applies to any introduction of a foreign structural 
gene combined with a pRi T L -DNA promoter or polyadenyl ation site Into any 
plant species into which foreign ONA (in the preferred embodiment pTi 
T-DNA) can be introduced and maintained by any means. In other words, the 
invention provides a means for expressing a structural gene in a plant 
cell and is not restricted to any particular means for introducing foreign 
DNA into a plant cell and maintaining the DNA therein. Such means 
include, but are not limited to, T-DNA-based vectors (including pTi-based 
vectors), viral vectors, mini chromosomes, non-T-DNA integrating vectors, 
and the like. 

The invention is useful for genetically modifying plant cells, plant 
tissues, and whole plants by inserting useful structural genes from other 
species, organisms, or strains that change phenotypes of plants or plant 
cells when expressed therein. Such useful structural genes include, but 
are not limited to, genes conveying phenotypes such as improved tolerance 
to extremes of heat or cold; improved tolerance to drought or osmotic 
stress; improved resistance or tolerance to insect (e.g. insecticidal 
toxins), arachnid, nematode, or epiphyte pests and fungal, bacterial, or 
viral diseases, or the like; the production of enzymes or secondary meta- 
bolites not normally found in said tissues or plants; improved nutritional 
(e.g. storage proteins or lectins), flavor (e.g. sweet proteins), or pro- 
cessing properties when used for fiber or human or animal food; changed 
morphological traits or developmental patterns (e.g. leaf hairs which 
protect the plant from insects, aesthetically pleasing coloring or form, 
changed plant growth habits, dwarf plants, reduced time needed for the 
plants to reach maturity, expression of a gene in a tissue or at a time 
that gene is not usually expressed, and the like); male sterility; 
improved photosynthetic efficiency (including lowered photorespiration) ; 
improved nitrogen fixation; improved uptake of nutrients; improved 
tolerance to herbicides; increased crop yield; improved competition with 

i. .' 
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other plants; and Improved gennplasm Identification by the presence of one 
or more characteristic nucleic add sequences, proteins, or gene products, 
or phenotypes however identified (to distinguish a genetically modified 
plant of the present Invention from plants which are not so modified, to 
5 facilitate transfer of a linked artificially introduced phenotype by other 
(e.g. sexual) means to other genotypes or to facilitate Identification of 
plants protected by patents or by plant variety protection certificates); 
selectable markers (i.e. genes conveying resistance in cell or tissue 
culture to selective agents); screenable markers; and the like. 

° The invention 1s exemplified by introduction and expression of a 

structural gene for phaseolin, the major seed storage protein of the bean 
Phaseolus vulgaris L. , into plant cells. The Introduction and expression 
of the structural gene for phaseolin, for example, can be used to enhance 
the protein content and nutritional value of forage or other crops. The 
15 invention is also exemplified by the introduction and expression of a 

lectin structural gene, in this case also obtained f rom vulgaris , into 
plant cells. The introduction and expression of a novel lectin may be 
used to change the nutritional or symbiotic properties of a plant 
tissue. The invention is exemplified in yet other embodiments by the 
20 introduction and expression of DNA sequences encoding thaumatin, and its 
precursors prothaumatin, prethaumatin, and preprothaumatin. Mature thau- 
matin is a heat-labile, sweet-tasting protein found naturally in katemfe 
( Thaumatococcus daniellii ) which can be used to enhance the flavor of 
vegetables which are eaten uncooked without significantly increasing the 
25 caloric content of the vegetables. The invention is further exemplified 
by introduction and expression of a structural gene for a crystal protein 
fromji. thurinqiensis var. kurstaki HD-73 into plant cells. The introduc- 
tion and expression of the structural gene for an insecticidal protein can 
be used to protect a crop from infestation with insect larvae of species 
3 CJ ' which include, but are not limited to, hornworm ( Manduca sp.)» pink boll- 
worm ( Pectionophora gossypiella ), European corn borer (Ostrinia 
nubilalis ), tobacco budworm ( Heliothis virescens ), and cabbage looper 
( Trichoplusia ni_). Applications of insecticidal protein prepared from 
sporulating B. thurinqiensis does not control insects such as the pink 
35 bollwonn in the field because of their particular life cycles and feeding 
habits. A plant containing in its tissues insecticidal protein will con- 
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trol this recalcitrant type of Insect, thus providing advantage over prior 
1nsect1c1dal uses of thuringiensis . By Incorporation of the insectld- 
dal protein into the tissues of a plant, the present invention addition- 
ally provides advantage over such prior uses by eliminating Instances of 
5 nonuniform application and the costs of buying and applying 1nsect1c1dal 
preparations to a field. Also, the present invention eliminates the need 
for careful timing of application of such preparations since small larvae 
are most sensitive to insecticidal protein and the protein is always 
present, minimizing crop damage that would otherwise result from preappli- 
10 cation larval foraging* Other uses of the invention, exploiting the pro- 
perties of other structural genes introduced into various plant species, 
will be readily apparent to those skilled in the art. 



" DESCRIPTION OF THE DRAWINGS 

Figure 1. presents maps of the T^-DNA of agropine Ri plasmid pRiHRI 
and strategy used for sequencing. The top line represents the T L -DNA 
region from pRiHRI and the filled boxes indicate locations of ORFs 1 to 

20 18. The left and right T L -DNA borders* are those indent ified from analysis 
of T L -DNA integrated into Convolvulus arvensis clone 7 tissue. ORF 
polarities are indicated by the position of enclosed boxes on the con- 
tinuous line; above indicates transcription from left to right and below 
indicates transcription right to left, i.e. having an mRNA sequence com- 

25 plementary to that disclosed in Fig. 2. EcoR I and BamHI restriction maps 
are below the ORF map. The complete nucleotide sequence of the T|_-DNA was 
determined from five subclones mapped below the restriction maps: 
EcoR I 3a, BamH I 8a ; Number 16, pLJO ("cosmid 40"); and EcoR I 3b (see 
Example 2.2). Comparison of restriction enzyme site patterns (L. Jouanin 

30 (1984) Plasmid JJ_:91-102) and overlapping nucleotide sequenced region 
(Number 16 and cosmid 40) indicate that pRiHRI and pRiA4 T L -DNAs are 
essentially identical. Cleavage sites and direction of sequence analysis 
are shown below each subclone, and horizontal arrows indicate direction 
and distance of sequencing runs. Enzymes are abbreviated as follows: 

35 A, Aval; Ac, AccI; B, BamHI; Bg, Ball I; C, ClaJ; D, Oral; E, EcoRI; 

H, Hindi II; K, Kgnl; Msl, MstI; MslI, MstI I; Na, NarJ; Nc, Ncol; Ps, PstI; 
Pv t £vuII; Sa, SaM; St, Stul; Xb, Xbal; Xh, Xhol; Xm, Xmnl; and 
Xo, Xorll. 
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Figure 2. presents nucleotide sequence of T^-DNA region from 
A. rhlzoqenes agroplne-type plasmid pRiHRI. The sequence starts 520 base 
pairs (bp) to the left of the left T L -DNA/p1ant junction sequence Identi- 
fied 1n£. arvensis clone 7 and extends 1135 bp to the right of the 
5 clone 7 right T^-DNA/pl ant junction, a total of 21,126 bp. 

Figure 3. is a schematic diagram, not drawn to scale, of the DNA 
manipulation strategy utilized in the Examples. Sites susceptable to the 
action of a restriction enzyme are indicated by that enzyme's name or 
place of listing in a Table. For example, "T4c2" refers to an enzyme 

10 listed in Table 4, column 2. A site that 1s no longer susceptable to the 
enzyme 1s indicated by the presence of parenthesis around the name of the 
enzyme. The extent and polarity of an ORF is indicated by an arrow. 
Names of plasmids, again sometimes designated by place of listing in a 
Table (e.g. "T5cl" refers to a vector listed in Table 5, column 1), are 

15 within the circular representations of the plasmids. Names of vectors, 
again sometimes designated by a listing in a Table, are within the 
circular representations of the plasmids. "Ex" refers to the Example 
which describes a particular manipulation. 

20 

DETAILED DESCRIPTION OF THE INVENTION 

The following terms are defined in order to remove ambiguities to the 
intent or scope of their usage in the Specification and Claims. 

25 TxCS : Transcription ODntrol ling ^sequences refers to a promoter/ tran- 

script terminator combination flanking a particular structural gene or 
open reading frame (ORF). The promoter and transcript terminator DNA 
sequences flanking a particular inserted foreign structural gene need not 
be derived from the same source genes (e.g. pairing two different 

30 % .P Rl ' t l* dna ) genes or the same taxonomic source (e.g. pairing sequences 

from pRi T L -DNA with sequences from non-pRi-T L -DNA sources such as other 
types of T-DNA, plants, animals, fungi, yeasts, and eukaryotic viruses). 
■Therefore the term TxCS refers to either combination of a claimed promoter 
with an unclaimed transcript terminator, or combination of a unclaimed 

35 promoter with a claimed polyadenyl ation site, or combination of a promoter 
and a polyadenyl ation site which are both claimed. Examples of non- 



BNSOOCID: <EP__0204590A2_I_> 



0204590 



- 17 - 

pRi-T L -DNA plant-expressible promoters which can be used 1n conjunction 
with a pRl T^-DNA polyadenyl ation site Include, but are not limited to, 
those from genes for nos , ocs , phaseolin, RuBP-Case small subunlt and the 
19S and 35S transcripts of cauliflower mosaic virus (CaMV). 

Promoter : Refers to sequences at the 5'-end of a structural gene 
involved in Initiation of translation or transcription. Expression under 
control of a pRi T-DNA promoter may take the form of direct expression 1n 
which the structural gene normally controlled by the promoter is removed 
in part or in whole and replaced by the inserted foreign structural gene, 
a start codon being provided either as a remnant of the pRi T-DNA struc- 
tural gene or as part of the inserted structural gene, or by fusion pro- 
tein expression in which part or all of the structural gene is Inserted in 
correct reading frame phase within the existing pRi T-DNA structural 
gene. In the latter case, the expression product is referred to as a 
fusion protein. The promoter segment may itself be a composite of seg- 
ments derived from a plurality of sources, naturally* occurring or syn- 
thetic. Eukaryotic promoters are commonly recognized by the presence of 
DNA sequences homologous to the canonical form 5 f . . .TATAA. . .3' about 
10-30 bp 5' to the location of the 5'-end of the mRNA (cap site). About 
30 bp 5' to the TATAA another promoter sequence is often found which is 
recognized by the presence of DNA sequences homologous to the canonical 
form 5' .. .CCAAT. ..3'. Transl ational initiation often begins at the first 
5' . . .AUG. . .3' 3* -from the cap site (see Example 1.5). 

Transcript terminator : Refers to any nucleic acid sequence capable 
of determining the 3 '-end of a eukaryotic messenger RNA (mRNA). The tran- 
script terminator DNA segment may itself be a composite of segments 
derived from a plurality of sources, naturally occurring or synthetic, and 
may be from a genomic DNA or an RNA-derived cDNA. Some eukaryotic RNAs, 
e.g. histone mRNA (P. A. Krieg and D. A. Melton (1984) Nature 308:203- 
206), ribosomal RNA, and transfer RNA, are not 3 1 -terminated by poly- 
adenyl ic acid or by polyadenyl ation sites; it is intended that the term 
transcript terminator include, but not be limited to, both nucleic acid 
sequences determining the 3'-ends of such transcripts and polyadenylation 
site sequences (see below). 
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Polyadenylation site ; Refers to any nucleic add sequence capable of 
determining the 3' -end of a eukaryotlc polyadenyleted mRNA. After tran- 
scriptional termination polyadenyllc add "tails" are added to the 3'-end 
of most mRNA precursors. The polyadenylation site DNA segment may Itself 
5 be a composite" of segments derived from a plurality of sources, naturally 
occurring or synthetic, and may be from a genomic DNA or an mRNA-derived 
cDNA. Polyadenylation sites are commonly recognized by the presence of 
homology to the canonical form 5' . . .AATAAA. . .3' , although variation of 
distance, partial "read-thru", and multiple tandem canonical sequences are 
10 not uncommon. It should be recognized that a canonical "polyadenylation 
site" may in fact not actually cause polyadenylation ^er^e (N. Proudfoot 
(1984) Nature 307 :412-413) and that sequences 3' to the "AATAAA" and the 
3'-end of the transcript may be needed (A. Gil and N. J. Proudfoot (1984) 
Nature 312:473-474). 
15 Foreign structural gene : As used herein includes that portion of a 

gene comprising a DNA segment coding for a foreign RNA, protein, polypep- 
tide or portion thereof, possibly including a translational start codon, 
but lacking at least one other functional element of a TxCS that regulates 
initiation or termination of transcription and inititation of translation, 
20 commonly referred to as the promoter region and transcript terminator. As 
used herein, the term foreign structural gene does not include pRi T L -DNA 
structural genes unless the structural gene and pRi T L -DNA transcription 
controlling sequences combined with the structural gene are derived from 
different pRi T L -DNA genes; i.e. unless the structural gene and either a 
25 pRi promoter or a pRi polyadenylation site combined with the structural 

gene are heterologous. (Note that such foreign functional elements may be 
present after combination of the foreign structural gene with a pRi T L -DNA 
TxCS, though, in embodiments of the present invention, such elements may 
not be functional in plant cells). A foreign structural gene may encode a 
30 " protein not normally found in the plant cell in which the gene is intro- 
duced. Additionally, the term refers to copies of a structural gene 
. naturally found within the cell but artificially introduced. A foreign 
structural gene may be derived in whole or in part from sources including 
but not limited to eukaryotic DNA, prokaryotic DNA, episomal DNA, plasmid 
35 DNA, plastid DNA, genomic DNA, cDNA, viral DNA, viral cDNA, or chemically 
synthesized DNA. It is further contemplated that a foreign structural 
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gene may contain one or more modifications 1n either the coding segments 
or untranslated regions which could affect the biological activity or 
chemical structure of the expression product, the rate of expression or 
the manner of expression control. Such modifications Include, but are not 

5 limited to, mutations, Insertions, deletions, and substitutions of one or 
more nucleotides, and "silent" modifications that do not alter the 
chemical structure of the expression product but which affect intercellu- 
lar localization, transport, excretion or stability of the expression 
product. The structural gene may constitute an uninterrupted coding 

.0 sequence or it may include one or more introns, bounded by the appropriate 
plant functional splice junctions, which may be obtained from synthetic or 
a naturally occurring source. The structural gene may be a composite of 
segments derived from a plurality of sources, naturally occurring or syn- 
thetic, coding for a composite protein, the composite protein being 

-5 foreign to the cell into which the gene is introduced and expressed or 

being derived in part from a foreign protein. The foreign structural gene 
may be a fusion protein, and in particular, may be fused to all or part of 
a structural gene 'derived from the same ORF as was the TxCS. ■■ . 

Plant tissue ; Includes differentiated and undifferentiated tissues 
-0 of plants including, but not limited to roots, shoots, pollen, seeds, 
tumor tissue, such as crown galls, and various forms of aggregations of 
plant cells in culture, such as embryos and calluses. The plant tissue 
may be in planta or in organ, tissue, or cell culture. 

^ Plant cell : As used herein includes plant cells in planta and plant 

cells and protoplasts in culture. 

Production of a genetically modified plant, plant seed, plant tissue, 
or plant cell expressing a foreign structual gene under control of a pRi 
T-DNA TxCS, and especially a pRi T L -DNA-deri ved TxCS, combines the 

30 specific teachings of the present disclosure with a variety of techniques 
and expedients known in the art. In most instances, alternative expe- 
dients exist for each stage of the overall process. The choice of expe- 
dients depends on variables such as the choice of the basic vector system 
for the introduction and stable maintenance of the pRi T L -DHA 

35 TxCS/structural gene combination, the plant species to be modified and the 
desired regeneration strategy, and the particular foreign structural gene 
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to be used, all of which present alternative process steps which those of 
ordinary skill are able to select and use to achieve a desired result. 
For Instance, although the starting point for obtaining pR1 T L ~DNA TxCSs 
1s exemplified 1n the present application by pR1 T L -DNA Isolated from 
5 pR1A4 and pRiHRI. DNA sequences of other homologous agroplne-type R1 T1 
plasmids night be substituted as long as appropriate modifications are 
made to the TxCS Isolation and manipulation procedures. Additionally, 
T-DNA genes from other types of pR1 T L -DNA homologous to the agroplne-type 
pRi T L -DNA genes having TxCSs disclosed herein may be substituted, again 
10 with appropriate modifications of procedural details. Homologous genes 

may be identified by those of ordinary skill 1n the art by the ability of 
their nucleic acids to cross-hybridize under conditions of stringency 
appropriate to detect 70S homology; such conditions are well understood in 
the art. It will be understood that there may be minor sequence varia- 
15 tions within gene sequences utilized or disclosed in the present applica- 
tion. These variations may be determined by standard techniques to enable 
those of ordinary skill in the art to manipulate and bring into utility 
the T-DNA promoters and transcript terminators of such homologous genes, 
(Homologs of foreign structural genes may be identified, isolated, 
/i: sequenced, and manipulated as is in a similar manner as homologs of the 

pRi genes of the present invention.) As novel means are developed for the 
stable Insertion of foreign genes in plant cells, those of ordinary skill 
in the art will be able to select among those alternate process steps to 
achieve a desired result. The fundamental aspects of the invention are 
25 the nature and structure of pRi T-DNA genes $nd their use as a means for 
expression of a foreign structural gene in a plant genome. The remaining 
steps of the preferred embodiment for obtaining a genetically modified 
plant include inserting the pRi T L -DNA TxCS/structural gene combination 
into T-DNA, transferring the modified T-DNA to a plant cell wherein the 
30 • modified T-DNA becomes stably integrated as part of the plant cell genome, 
techniques for in vitro culture and eventual regeneration into whole 
plants, which may include steps for selecting and detecting transformed 
plant cells and steps of transferring the introduced gene from the 
originally transformed strain into commercially acceptable cultivars. 

35 ^ n advantage, which will be readily understood by those skilled in 

the art, of use of transcription controlling sequences disclosed herein 
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for controlling structural gene expression over previously published T-DNA 
TxCSs Is that transcription of many pR1 T-DNA ORFs 1s phenotyplcally and 
developmental ly regulated (see Example 1.9). pT1 T-DNA genes are not 
known to be so regulated. Transcripts of ORFs 8, 11, 13, and 15 are more 

5 prevalent 1n roots than leaves, with the case of ORF 15 being particularly 
striking, while ORF 12 expression 1s specific to leaves and to a partic- 
ular phenotype (T 1 , see Example 1.9). Therefore, choice of a particular 
pRi T L -DNA TxCS allows modulation of expression of a structural gene with 
which the TxCS is combined. For example, should one want expression of a 

0 structural gene to be much higher in roots than leaves; 0RF15 provdies the 
TxCS of choice. 

A principal feature of the present invention in its preferred embodi- 
ment is the construction of T-DNA having an inserted foreign structural 
gene under control of a pRi T L -DMA TxCS, i.e., between a promoter and a 

5 polyadenylation site, as these terms have been defined, supra , at least 
one of which is derived from pRi T^-DNA. The structural gene must be 
inserted in correct position and orientation with respect to the desired 
pRi T L -DNA promoter. Position has two aspects. The first relates to 
which side of the promoter the structural gene is inserted. It is known 

0 that the majority of promoters control initiation of transcription and 

translation in one direction only along the DNA. The region of DNA lying 
under promoter control is said to lie "downstream" or alternatively 
"behind" or "3 1 to" the promoter. Therefore, to be controlled by the 
promoter, the correct position of foreign structural gene insertion must 

- by "downstream" from the promoter. The second aspect of position refers 
to the distance, in base pairs, between known functional elements of the 
promoter, for example the transcription initiation site, and the trans- 
lational start site of the structural gene. Substantial variation appears 
. to exist with regard to this distance, from promoter to promoter. There- 

0 .fore, the structural requirements in this regard are best described in 

functional terms. As a first approximation, reasonable operability can be 
obtained when the distance between the promoter and the inserted foreign 
structural gene is similar to the distance between the promoter and the 
T-DNA gene it normally controls. Orientation refers to the directionality 

5 of the structural gene. That portion of a structural gene which ulti- 
mately codes for the amino terminus of the foreign protein is termed the 



5DOCID: <EP 0204590 A2_l_> 



0204590 



- 22 



10 



5' -end of the structural gene, while that end which codes for amino acids 
near the carboxyl end of the protein is termed the 3'-end of the struc- 
tural gene. Correct orientation of the foreign structural gene is with 
the 5* -end thereof proximal to the promoter. An additional requirement in 
the case of constructions leading to fusion protein expression is that the 
insertion of the foreign structural gene into the pRi T L -DNA promoter- 
donated structural gene sequence must be such that the coding sequences of 
the two genes are in the same reading frame phase, a structural require- 
ment which 1s well understood in the art. An exception to this require- 
ment exists in the case where an intron separates coding sequences derived 
from a foreign structural gene from the coding sequences of the pRi T L -DNA 
structural gene. In that case, both structural genes must be provided 
with compatible splice sites, and the intron splice sites must be so 
positioned that the correct reading .frame for the pRi T L -DNA promoter- 
15 donated structural gene and the foreign structural gene are restored in 
phase after the intron is removed by post-transcriptional processing. 
Differences in rates of expression or developmental control may be 
observed when a given foreign structural gene is inserted under control of 
different pRi T L -DNA TxCSs. Rates of expression, may also be greatly 
20 influenced by the details of the resultant mRNA's secondary structure, 
especially stem-loop structures. Stability, ability to be excreted, 
intercellular localization, intracellular localization, solubility, target 
specificity, and other functional properties of the expressed protein 
itself may be observed in the case of fusion proteins depending upon the 
2b insertion site, the length and properties of the segment of pRi T L -ONA 

protein included within the fusion protein and mutual interactions between 
the components of the fusion protein that effect folded configuration 
thereof, all of which present numerous opportunities to manipulate and 
control the functional properties of the foreign protein product, depen- 
30 ding upon the desired physiological properties within the plant cell, 

plant tissue, and whole plant. Similarly to the promoter, the polyadenyl- 
ation site must be located in correct position and orientation relative to 
the 3' -end of the coding sequence. Fusion proteins are also possible 
between the 3' -end of the foreign structural gene protein and a polypep- 
3 - tide encoded by the DNA which serves as a source of the polyadenyl ation 
site. 
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A TxCS 1s comprised by two major functionalities: a promoter, which 
1s absolutely necessary for gene expression, and a transcript terminator, 
being 1n the preferred embodiment a polyadenylatlon site, positioned 
respectively 5 1 and 3' to the structural gene. Although as exemplified 
5 herein these two portions of the TxCS are obtained from the same gene, 
this 1s not a requirement of the present Invention. These 5' and 3' 
sequences may be obtained from diverse pRi T-DNA genes, especially 
pRI T L -DNA genes, or one of these sequences may even be obtained from a 
non-pRi T-DNA gene. For instance, a promoter may be taken from a 
10 pRI T L -DNA gene while the polyadenylation site may come from a plant gene. 

In the Examples, a foreign structural gene is nested within a 
pRi T L -DNA TxCS, suturing the structural gene into the TxCS at Ndel sites 
and placing the entire TxCS/structural gene combination between a pair of 
BanjHI sites. As will be apparent to those of ordinary skill in the art, 
the TxCS/gene combination may be placed between any restriction sites 
convenient for removing the combination from the plasmid it is carried on 
and convenient for insertion into the plant transformation or shuttle 
vector of choice. Alternatives to the use of paired Ndel sites 

^ Q (5* . . .CATATG. . .3* ) at the ATG translational start include, but are not. 
limited to, use of Clal (5'. -.(not G)ATC GAT(G) . . .3* ) or Ncol 
(5 , ...CCATGG...3 I ) sites. As will be understood by persons skilled in the 
art, other sites may be used for the promoter/structural gene suture as • 
long as the sequence at the junction remains compatible with translational 

25 and transcriptional functions. An alternative to the suture of the pro- 
moter to the foreign structural gene at the ATG translational start is 
suturing at the transcriptional start or cap site. An advantage, 
especially for eukaryotic structural genes, of the use of this location is 
the secondary (stem-loop) structure of the foreign structural gene mRNA 

30 ^ will not be disrupted thereby leading to an mRNA having translational 
activity more nearly resembling the activity observed in the organism 
which was the source of the gene. The restriction sites at the 5'- and 
.3' -ends of the structural gene need not be compatible. Use of cut sites 
cut by two different restriction enzymes at the two TxCS/structural gene 

35 junctions will automatically correctly orient the structural gene when it 
is inserted between the TxCS elements, though use of an extra restriction 
enzyme may necessitate removal of an additional set of inconvenient 
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restriction sites within the TxCS and the structural gene. The use of a 
single restriction enzyme to link both a promoter and a polyadenylatlon 
site to a particular structural gene 1s not required. Convenient sites 
within the pR1 T L -DNA structural gene and 3' to the translatlonal stop of 
5 the foreign structural gene may be used. When these sites have Incom- 
patible ends, they may be converted to blunt-ends by methods well known 1n 
the art and blunt-end ligated together. 

Location of the TxCS/foreign structural gene combination Insertion 
site within T-DNA or a T-DNA-deri ved vector is not critical as long as the 

1 c 

transfer function of the T-DNA borders and any other necessary vector 
elements (e.g. a selectable or screenable marker) are not disrupted. The 
T-DNA into which the TxCS/structural gene combination is inserted may be 
obtained from any of the TIP plasmids, including both T1 and Ri plas- 
mids. The TxCS/structural gene combination is Inserted by standard tech- 
niques well known to those skilled in the art. The orientation of the 
inserted plant gene, with respect to the direction of transcription and 
translation of endogenous T-DNA or vector genes is not critical, either of 
the two possible orientations is functional. Differences in rates of 
expression might be observed when a given gene is inserted at different 

20 

locations within T-DNA. 

A convenient means for inserting a TxCS/foreign structural gene com- 
bination into T-DNA involves the use of a shuttle vector, as described in 
the Background. An Agrobacterium strain transformed by a shuttle vector 

25 is preferably grown under conditions which permit selection of a double- 
homologous recombination event which results in replacement of a pre- 
existing segment of a Ti or Ri plasmid with a segment of T-DNA of the 
shuttle vector. m However, it should be noted that the present invention is 
not limited to the introduction of the TxCS/structural gene combination 

30 into T-DNA by a double homologous recombination mechanism; a homologous 
recombination event with a shuttle vector (perhaps have only a single 
continuous region of homology with the T-DNA) at a single site will also 
prove an effective means for inserting that combination into T-DNA as will 
insertion of a combination-carrying bacterial transposon. 

35 

An alternative to the shuttle vector strategy involves the use of 
plasmids comprising T-DNA or modified T-DNA, into which an TxCS/foreign 
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structural gene 1s Inserted, said plasmlds lacking j*1_r genes and being 
capable of Independent replication 1n an Agrobacterium strain. As 
reviewed 1n the Background, the T-DNA of such plasmlds can be transferred 
from an Agrobacterlum strain (e.g. A. rh1zoqenes » A_. tumefadens , or deri- 
vatives thereof) to a plant cell provided the Agrobacterlum strain con- 
tains certain trans -acting v1r genes whose function is to promote the 
transfer of T-DNA to a plant cell. Plasmlds that contain T-DNA and are 
able to replicate Independently 1n an Agrobacterium strain are herein 
termed "sub-TIP" plasmlds. A spectrum of variations is possible in which 
the sub-TIP plasmids, which may be derived from Ri or T1 plasmids, differ 
In the amount of T-DNA contained. A "mlni-TIP" plasmid retains all of the 
T-DNA from a TIP. "Micro-TIP" plasmids are deleted for all T-DNA but that 
surrounding the T-DNA borders, the remaining portions being the minimum 
necessary for the sub-TIP plasmid to be transferable and integratable in 
the host cell. Sub-TIP plasmids are advantageous in that they are rela- 
tively small and relatively easy to manipulate directly, eliminating the 
need to transfer the gene to T-DNA from a shuttle vector by homologous 
recombination. After the desired structural gene has been inserted, they 
can easily be introduced directly into a Agrobacterium cell containing the 
trans- acting genes that promote T-DNA transfer. Introduction into an 
Agrobacterium strain is conveniently accomplished either by transformation 
of the Agrobacterium strain or by conjugal transfer from a donor bacterial 
cell, the techniques for which are well known to those of ordinary 
skill. 

pRi T-DNA TxCS/structural gene combinations may be combined with 
pTi-derived Ti plasmids or sub-TIP vectors. 

Modified T-DNA carrying a pRi T L -DNA TxCS/structural gene combination 
can be transferred to plant cells by any technique known in the art (see * 
Background). The resultant transformed cells must be selected or screened 
to distinguish them from untransformed cells. Selection is most readily 
accomplished by providing a selectable marker known to the art incorpora- 
ted into the T-DNA in addition to the TxCS/foreign structural gene com- 
bination. Indeed, a pRi T|_-DNA TxCS can be a component of such a 
marker. In addition, the T-DNA provides endogenous markers such as the 
gene or genes controlling hormone- independent growth of Ti-induced tumors 
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1n culture, the gene or genes controlling abnormal morphology of R1- 
Induced tumor roots, and genes that control resistance to toxic compounds 
such as amino acid analogs, such resistance being provided by an opine 
synthase (e.g. ocs) . Screening methods well known to those skilled in the 
5 art include assays for opine production, specific hybridization to charac- 
teristic RNA or T-DNA sequences, or immunological assays. Additionally 
the phenotype of expressed foreign gene can be used to identify trans- 
formed plant tissue (e.g. insecticidal properties of the crystal protein). 

Although the preferred embodiment of this invention uses a T-DNA- 
1 based Agrobacterium -medi ated system for incorporation of the TxCS/forelgn 
structural gene combination into the genome of the plant which is to be 
transformed, other means for transferring and incorporating the gene are 
also included within the scope of this invention. Other means for the 
stable incorporation of the combination into a plant genome additionally 
13 include, but are not limited to, use of vectors based upon viral genomes 
(e.g. see N. Brisson et aU (1984) Nature 310 :511-514), minichromosomes, 
transposons, and homologous or nonhomologous recombination into plant 
chromosomes. Alternate forms of delivery of these vectors Into a plant 
cell additionally include, but are not limited to, direct uptake of 
nucleic acid (e.g. see J. Paszkowski et _al_. (1984) EMBO J. 2:2717-2722) , 
fusion with vector-containing liposomes or bacterial spheroplasts, micro- 
injection, and encapsidation in viral coat protein followed by an infec- 
tion-like process. After introduction into a plant cell of a pRi T L -DNA 
-> 5 TxCS/structural gene combination, the combination will be contained by a 
plant cell. Furthermore, the combination will be flanked by plant DNA, 
unless utilizing a nonintegrating vector, e.g. a virus or mini chromosome. 

Regeneration of transformed cells and tissues is accomplished by 
resort to known techniques. An object of the regeneration step is to 

30 obtain a whole plant that grows and reproduces normally but which retains 
integrated T-DNA, The techniques of regeneration vary somewhat according 
to principles known in the art, depending upon the origin of the T-DNA, 
the nature of any modifications thereto and the species of the transformed 
plant. In many plant species, cells transformed by pRi-type T-DNA are 

35 readily regenerated, using techniques well known to those of ordinary 

skill, without undue experimentation. Plant cells transformed by pTi-type 
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T-DNA can be regenerated, 1n some Instances, by the proper manipulation of 
hormone levels 1n culture* Preferably, however, the T1-transformed tissue 
1s most easily regenerated 1f the T-DNA has been mutated in one or both of 
the tmr and tms genes. It 1s Important to note that 1f the mutations 1n 
tmr and tms are Introduced Into T-DNA by double homologous recombination 
with a shuttle vector, the Incorporation of the mutation must be selected 
in a different manner than the incorporation of the TxCS/structural gene 
combination; e.g. one might select for tmr and tms inactivation by chlo- 
ramphenicol resistance while one might select for TxCS/foreign gene inte- 
gration by kanamycin resistance. The inactivation of the tms and tmr loci 
may be accomplished by an insertion, deletion, or substitution of one or 
more nucleotides within the coding regions or promoters of these genes, 
the mutation being designed to inactivate the promoter or disrupt the 
structure of the encoded proteins (e.g. the T-DNA of NRRL B-15821, or the 
pTi of A3004, L. W. Ream et _aK (1983) Proc. Natl. Acad. Sci. U.S.A. 
80:1660-1664) . Resultant transformed cells are able to regenerate plants 
which carry integrated T-DNA and express T-DNA genes, such as an opine 
synthase, and -also express an inserted pRi T^-DNA TxCS/structural gene 
combination. These serve as parental plant material for normal progeny 
plants carrying and expressing the pRi Tj_-DNA TxCS/heterologous foreign 
structural gene combination, and for seeds containing the combination, in 
the preferred embodiments the combination being integrated into a plant 
chromosome and flanked by plant DNA. 

The genotype of the plant tissue transformed is often chosen for the 
ease with which its cells can be grown and regenerated in in vitro culture 
and for susceptibility to the selective agent to be used. Should a cul- 
tivar of agronomic interest be unsuitable for these manipulations, a more 
amenable variety is first transformed. After regeneration, the newly 
introduced TxCS/foreign structural gene combination is readily transferred 
to the desired agronomic cultivar by techniques well known to those 
skilled in the arts of plant breeding and plant genetics. Sexual crosses 
.of transformed plants with the agronomic cultivars yielded initial 
hybrid. These hybrids can then be back-crossed with plants of the desired 
genetic background. Progeny are continuously screened and selected for 
the continued presence of integrated T-DNA or for the new phenotype resul- 
ting from expression of the inserted foreign gene. In this manner, after 
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a number of rounds of back-crossing and selection, plants can be P^IS0Wl^TEfi16Oty2S»e 
having a genotype essentially Identical to the agronomlcally desired 
parents with the addition of a inserted pR1 T-DNA promoter/foreign 
structural gene combination or of a foreign structural gene/polyadenyla- 
tlon site combination. 



EXAMPLES 

The following Examples are presented for the purpose of Illustrating 
° specific embodiments within the scope of the present invention without 

limiting the scope; the scope being defined by the Claims. Numerous 
variations will be readily apparent to those of ordinary skill in the art. 

These Examples utilize many' techniques well known and accessible to 
15 those skilled in the arts of molecular biology and manipulation of TIPs 

and Agrobacterium ; such methods are fully described in one or more of the 
cited references if not described in detail herein* Enzymes are obtained 
from commercial sources and are used according to the vendor's recommenda- 
tions or other variations known to the art. Reagents, buffers and culture 
20 conditions are also known to those in the art. Reference works containing 

such standard techniques include the following: R. Wu, ed. (1979) Heth. 
Enzymol. 68, R. "Wu_et aK, eds. (1983) Heth. Enzymol. 100 and 101, 
L. Grossman and K. Moldave, eds. (1930) Meth. Enzymol. 6.5, J. H. Miller 
(1972) Experiments in Molecular Genetics , R. Davis _et jK (1980) Advanced 
-5 Bacterial Genetics , R. F. Schleif and P. C. Wensink (1982) Practical 

Methods in Molecular Biology » and T. Maniatis ^et (1982) Molecular 
Cloning . Additionally, R. F. Lathe et^aK (1983) Genet. Engln. 4/.1-56, 
make useful comments on DNA manipulations. 

^ Textual use of the name of a restriction endonuclease in isolation, 

e.g. "Bel I" , refers to use of that enzyme in an enzymatic digestion, 
except in a diagram where it can refer to the site of a sequence suscep- 
tible to action of that enzyme, e.g. a restriction site. In the text, 
restriction sites are indicated by the additional use of the word "site", 

35 e.g. "Bel I site". The additional use of the word "fragment", e.g. "Bel l 

fragment", indicates a linear double-stranded DNA molecule having ends 
generated by action of the named enzyme (e.g. a restriction fragment). A 
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phrase such as "Bcll/Smal fragment" Indicates that the restriction frag- 
ment was generated by the action of two different enzymes, here Bell and 
Sma l , the two ends resulting from the action of different enzymes. Note 
that the ends will have the characteristics of being "sticky" (I.e. havlnc 
5 a single-stranded protrusion capable of base-pa1r1ng with a complementary 
single-stranded oligonucleotide) or "blunt" and that the sequence of a 
sticky-end will be determined by the specificity of the enzyme which pro- 
duces it. 

In the Examples and Tables, the underlining of a particular nucleo- 
tide In a primer or other sequence indicates the nucleotide which differs 
from the naturally found sequence, being an Insertion or substitution of 
one or more nucleotides. The use of lower case for two adjacent nucleo- 
tides brackets one or more nucleotides that have been deleted from the 
native sequence. Unless otherwise noted, all oligonucleotide primers are 
phosphorylated at their S'-ends, are represented s'-to-S', and are synthe- 
sized and used as referenced in Example 5. 

Plasmids are usually prefaced with a "p", e.g., pRi A4 or p8.8, and 
strain parenthetically indicate a plasmid harbored within, e.g., 

SO A. rhizogenes ( pRi A4 ) or_E. coli HB101 (p8.8). Self-repl icatlng DNA mole- 
cules derived from the bacteriophage Ml 3 are prefaced by an "m", e.g. 
mWB2341, and may be in either single-stranded or double-strand form. 
A. tumefaciens (pTil5955) is on deposit in ATCC 15955, _E. coli C600 
(pRK-203-Kan-103-Lec) as NRRL B-15821, £. coli HB101 (pLJ40) as NRRL 

15 B-15957, and E. coli HB101 (EcoRI e36) as NRRL B-15958 (as deposited EcoRI 
e36 was designated EcoRI 3a); other deposited strains are listed in column 
3 of Table 7. 

The DMA constructions described in these Examples have been designed 
:Q to enable any one of the eukaryotic TxCSs of pRi T L -DHA to be combined 
with any of four foreign structural genes. Towards that end, the struc- 
tural genes, the TxCSs, and the TxCS/structural gene combinations have 
.been placed on DNA "cassettes", having the properties that, after initial 
modifications have been made, any structural gene may be readily inserted 
5 into any TxCS without further modification, and any TxCS/structural gene 
combination may be isolated by a simple procedure applicable to all such 
combinations. All combinations are thereby equivalent when being inserted 
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Into the plant transformation vector of choice. The Initial modifications 
of the TxCSs are all analogous to each other and the Initial modifications 
of the structural genes are also all analogous to each other. These 
Examples often Involve the use of a common strategy for multiple construc- 
5 tions that differ only 1n Items such as choice of restriction enzymes, DNA 
fragment size, ORFs encoded, plasmids generated or used as starting 
material, specific numbers and sequences of oligonucleotides used for 
mutagenesis, sources of plasmids, and enzyme reactions utilized. For the 
sake of brevity, the DNA manipulations and constructions are generally 
1C described once, the differing Items being detailed by reference to a par- 
ticular column in a particular Table, a particular series of manipulations 
used in a particular construction occupying horizontal lines within that 
Table. One combination, the ORF 11 TxCS with the crystal protein struc- . 
tural gene, is also detailed in the text. 

jhe following is an outline, diagrammed schematically in Figure 3, of 
a preferred strategy used to make the exemplified DNA constructions 
detailed in Examples 3 through 6. Endogenous Ndel sites are removed from 
the M13-based vector mWB2341, resulting in a vector designated 
20 mWB2341(Nde) (Example 3.1). Large fragments of T-DNA are introduced into 
mWB2341( Nde) 1n a manner that also eliminates the vector's BamH I site 
(Example 3.2). Endogenous T-DNA Ndel and BamHI sites are then removed 
(Example 3.3) and novel sites are introduced. Ndel sites are introduced 
at and near the translational start and stop sites, respectively, so that 
25 a foreign structural gene on a Nde l fragment may replace- the endogenous 
ORF structural gene. BamH I sites are introduced approximately 0.3 kbp 5 1 
to and 3' from the transcriptional start and stop signals, respectively, 
so that the TxCS/structural gene combination eventually constructed may be 
removed on a BamHI fragment (Example 3.4). The structural genes, which 

30 fortuitously have no internal Nde l or BamH I sites, are introduced into 

mWB2341 ( Nde) (Example 4.1) and Nde l sites are introduced at and after the 
translational start and stop sites (Examples 4.2 and 4.3). The structural 
■ genes are removed from their vectors on "DNA cassettes" by digestion with 
Ndel and are inserted into any desired TxCS which has had its endogenous 

3 - structural gene removed by Ndel digestion (Example 6.1). The TxCS/foreign 
structural gene combinations are then removed from their vector by diges- 
tion with Ba^iHI and inserted into the plant transformation vectors of 
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choice {Example 6.2). It 1s recognized that construction strategies 
utilizing fortuitously located restriction sites might be designed by 
persons of ordinary skill which might be simpler for some particular 
TxCS/structural gene combination than the generalized DNA cassette 
strategy utilized herein; however, DNA cassettes are a better approach 
when one is trying to achieve flexibility in the choice and matching of 
many diverse TxCSs and structural genes. 

Example 1 

This Example provides disclosure, analysis, and discussion of the 
pRi T L -DNA sequencing results. 

1 .1 Summary of results 

pRi T L -DNA was sequenced and eighteen open reading frames (ORFs) , 
two of which (7 and 18) were clearly prokaryotic in nature, were found. 
Eleven ORFs had canonical eukaryotic promoter and polyadenylation elements 
(ORFs 1, 2, 3, 6, 8, 11, 12, 13, 14, 15 and 16). These ORFs were distri- 
buted within an about 19.4 kilobase pair (kbp) segment of pRi T L -DNA inte- 
grated into the genome of C. arvensis clone 7. DNA encoding ORFs 8, 11, 
12, 13, and 15 was observed to be transcribed in tobacco. 

1*2 Sequence of pRi T L -DNA 

A physical map of the pRi T^-DNA region is shown in Figure 1 along 
with pRi subclones and the nucleotide sequencing strategy used. Nine- 
tenths of the sequence obtained was determined from both DNA strands, the 
remaining tenth being sequenced more than once from the same DNA strand. 
A nucleotide sequence of 21,126 base pairs (bp) was obtained, which inclu- 
ded a 19.4 kbp pRi T L -DNA region identified in.the genome of JC. arvensis 
clone 7, and is presented in Figure 2, 5'-to-3' corresponding to 
left-to-right as mapped in Fig. 1. DNA was sequenced from the 5'-end of 
BamHI fragment 32 to about 2216 bp into EcoRI fragment 3b (3'-end) (see 
Fig. 1). The cleavage sites for over seventy restriction enzymes were 
determined; cleavage positions for enzymes with less than nineteen sites 
are listed in Table 1. 
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1.3 T L -DNA border repeats 

Genomic hybridization end DNA sequence analyses of the T L -DNA region 
integrated Into the genome of Z. arvensis clone 7 showed the exact loca- 
tion of a left plant/T-DNA junction and an approximate position for a 

5 right pRi T L -DNA/plant junction (F. Leach (1983) Ph.D. Thesis, Universite 
de Paris-Sud, Centre d'Orsay, France). The left plant DNA/T-DNA junction 
was between position 570 and 571, as defined in Fig. 2. The left 25 bp 
T-DNA border repeat sequence was located between positions 520 and 544. 
The right boundary of T L -DNA of RiA4-transformed C_. arvensis could vary 

IC over a 8 kbp region. The complete 21,126 bp of pRi T L -DNA region was 

scanned for the presence of a 25 bp consensus sequence derived by compari- 
son with published sequences, S'TGGCAGGATATAtJ^^GCTAA^ 1 . Twenty-seven 
nucleotide sequences matching this consensus at 15 or more bases were 
identified. Included among these sequences were the 25 bp nucleotide 

15 sequences starting (5 1 ) at positions 520 (matching at 23 of 25 bases) and 
19,956 (17 of 25) (see Fig. 2). These two positions were near the 
T-DNA/plant junctions of a transformed Nicotiana glauca tissue (F. F. 
White _et j*K (1983) Nature 301:348-350) and C. arvensis clone 7, as deter- 
mined by comparison of genomic restriction maps of transformed plant DNA 

20- and pRiA4 DNA. Other matches were found at positions 154, 576, 725, 3244, 
6316, 6365, 7209, 7379, 8697, 10339, 10436, 11079, 11232, 12313, 13832, 
14235, 14510, 15145, 16285, 17071, 17483, 18121, 18273, 18368, and 
18797. The eleven previously published 25 bp border repeat sequences were 
as little as 64% homologous to each other, thus indicating that many of 

25 these pRi border sequences could be functional. Genomic hybridization 
analysis of the pRi T L -DNA region in tobacco (D. Tepfer (1984) Cell 
37:959-967) showed a much smaller T L -DNA with the left junction probably 
involving a border sequence at either position 6316. or 6365. 

30 i .4 Identification of open reading frames 

Analysis of the nucleotide sequence presented in Fig. 2 revealed the 
presence of sixteen ORFs starting with an ATG initiation codon and exten- 
ding over 300 nucleotides. The locations, sizes, and molecular weights of 
the putative translational polypeptides of these ORFs are listed in 

35 Table 3. Two additional ORFs (9 and 10) were shorter than 300 nucleotides 
but were' included in Table 3 because they satisfied other criteria (see 
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below) . The size of the ORFs ranged from 255 nucleotides (ORF 9) up to 
2280 nucleotides (ORF 8), encoding polypeptides ranging 1n size from 9600 
to 85,000 daltons, respectively. However, the actual size of an RNA tran- 
script encoding an ORF could be considerably larger than that listed 1n 
5 Table 3 because 5' and 3' noncodlng regions and 3*-polyadenyl 1c add tails 
were not Included. 

Though to date no Introns have been found 1n any of the fourteen 
sequenced pT1 T-DNA genes, (R. F. Barker et_jil. (1983) Plant Mol. Biol. 
2^:335-350), J. Gielen et (1984) EMB0 J. 2:835-846), Introns are 
!0 present 1n some plant nuclear genes; pRi T L -DNA genes could have 

introns. Transcript mapping (Example 1.9) did not generally indicate 
spliced mRNA. However, analysis of mRNA encoded between positions 6500 
and 9000 detected two transcripts, a 2300 base transcript as predicted for 
ORF 8 and an unpredicted 650 base transcript. The nucleotide sequence of 
15 the only other ORF in this region, ORF 9, suggested a transcript of about 
450 bases, about half the size as found. The coding region of ORF 8 was 
scanned for sequences which matched consensus donor 
(5*exon...TG*GT§A6T.. .intronS 1 , the "*" indicating the splice site) and 
acceptor (intron. . •tTtIIIS TAG * 6 ?- • - exon ) intron s P 1ice sequences and con- 
20 formed to the G-T/A-G rule (R. Breathnach et ±\j> (1978) Proc. Natl. Acad. 
Sci. USA _75.-4853-4857) and a plant consensus sequence (J. L. Slightom 
■ et _aK (1983) Proc. Natl. Acad. Sci. USA 80:1897-1901). Splicing between 
an acceptor at position 8943 and a donor at positions 7283, 7327, 7374, 
7701, or 7894 would result in a second transcript having a translation 
25 initiation codon-polyadenylation site distance of 724, 758, 943, 1270, or 
1325 bp, respectively, which is in the size range observed. Proper pro- 
cessing of an intron-containing genes in T-DNA has been observed (e.g. 
N. Murai et aK (1983) Science 222_:476-482) . 

No homology greater than random was found to exist in coding or 
30* noncoding sequences between pRi T^-DNA and octopine pTi T-DNA (Barker 
et jfL , supra ), consistent with the lack of cross-hybridization between 
pRi T[_-DNA and octopine pTi T-DNA observed by G. A. Huffman et^aK (1984) 
J. Bacteriol. 157:269-276, and L. Jouanin (1984) Plasmid 12:91-102. 
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1,5 Translational Initiation codons 

Eukaryotlc translation 1s preferentially Initiated at the first AUG 
of an mRNA; and A or G at position -3 and 6 at position +4 may facilitate 
recognition of functional AUG codons. This §XXAUGG consensus 1s referred 
5 to as the ribosome binding site (M. Kozak (1981) Nucl. Acids Res. .9:5233- 
5252; M. Kozak (1983) Cell 21-971-978). The number of amino adds and 
calculated molecular weights for the putative pRi T L -DNA protein products 
(Table 3) were derived by assigning the first 1n-phase AUG codon as the 
initiator codon. The art has not ruled out use of secondary AUG codons as 
10 translation Initiation codons (M, Kozak (1983) Microbiol. Rev. 47:1-45). 

Initiator codon DNA sequences are listed in Table 3 below the con- 
sensus eukaryotic ribosome binding site. Eight of the eighteen ORFs had 
first AUG codons which conform with this consensus sequence (ORFs 1,7,8, 
10, 11, 12, 14, and 18). Of the ten remaining ORFs, four had downstream, 
15 in- phase AUG codons which conformed with the consensus sequence: ORF 2, 
287 bp downstream; ORF 3, 160 bp; ORF 6, 344 bp; ORF 13, 203 bp; and 
ORF 17, 105 bp (see Fig. 2). The remaining six ORFs (2, 4, 5, 9, 15, and 
16) did not have any AUG codons which conform to the consensus sequence 
followed by 300 bp in-phase ORFs. The presence of a consensus ribosome 
20 binding AUG codon is not necessary for translation initiation of .T-DNA 

mRNAs; four abundantly transcribed octopine pTi T L -DNA genes are initiated 
at AUG codons which do not conform to the consensus sequences. 

Several pTi T-DNA ORFs are actively transcribed in £. coli mini- 
cells (G. Schroder et al- (1983) EMBO J. _2:403-409). Translational 
25 initiation in E_. coli and most prokaryotes generally start at an AUG codon 
that is proceeded by a G-rich ribosome binding site (J. Shine and 
L. Oalgarno (1974) Proc. Natl. Acad. Sci. USA 21:1342-1346). Sequences 
which may function as prokaryotic ribosome binding sites were observed 
ahead of the pRi T L -DNA ORF 4, 5, 7, 9, and 18 initiation codons. 

30 

1.6 Codon usage 

Most pRi T L -DNA ORFs were observed to fit pTi T^-DNA codon 
preference patterns, thereby indicating that they are functional after 
integration into a plant genome, notable exceptions being ORFs 7 and 18. 
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1.7 Locations of transcription controlling sequences 

Comparisons of nucleotide sequences from the 5'-flank1ng regions of 
many eukaryotic genes have revealed consensus locations and sequences of 
several DNA elements which may be Important 1n regulating RNA poly- 
5 merase II-med1ated transcription (S. L. McKnight and R. Kingsbury (1982) 
Science 217:316-324). These characteristic eukaryotlc promoter elements 
are the "TATA-element", located Z5-30 bp upstream (5') from the start of 
transcription, and the "CCAAT-element", located 40-50 nucleotides upstream 
from the TATA-element (C. Benoist et__al_. (1980) Nucl. Acids. Res. 8_:127- 
10 142; A. Efstratiades jet aTk (1980) Cell 21/.653-668). Similar promoter 
elements have been found in the S'-flanking regions of many plant and 
pTi -T-DNA genes; pTi 15955 T-DNA (Barker jrt ^aK , supra ) and pTiAchS T^-ONA 
(Gielen etaK, supra) have sequences resembling these TATA and CCAAT 
promoter elements located in the 5* -flanking regions of eight T L -DNA and 
15 six T R -DNA ORFs (i.e. have "eukaryotic-looking" promoters) All eight 

eukaryotic-looking pTi T L -DNA ORFs are transcribed and at least five of 
six eukaryotic-looking pTi T R -DNA ORFs are known to be transcribed. 

The presence of TATA and CCAAT promoter elements in 5* -flanking 
regions of pRi T^-DNA ORFs indicated that a particular ORF was part of a 
20 functional gene. Most pRi T L -DNA ORFs (16 of 18) were flanked by 

sequences (Table 3) that closely resembled these eukaryotic promoter ele- 
ments. The amount of sequence identity between the promoter elements. and 
the consensus sequences was very high; ORFs 2 and 12 had promoter elements 
which matched the consensus sequences while the promoter elements from the 
25 other thirteen ORFs did not vary by more than three mismatches. These 
results were consistent with the degree of homology found for promoter 
elements from pTi T-DNA ORFs (Barker et al. t supra ; Gielen et al_. , supra ). 

pRi T L -DNA open reading frames 1, 4, 8, 10, 13, 14, and 17 were 
flanked by multiple promoter elements. ORFs 7 and 18 were not flanked by 
30 sequences resembling eukaryotic promoter elements and were not expected to 

be transcribed in plant tissues. ORFs 4, 5, 7, and 9 overlapped ORFs 5, 
6, and 8 on the opposite strand (Fig. 1, Table 2); the larger ORFs (5, 6, 
and 8) were more likely to be transcribed because DNA encoding over- 
lapping, antiparallel ORFs in pTi T-DNA was found to be transcribed from 
35 either one strand or the other (Gielen et al^- , supra ). 
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Comparison of polyadenylation sites present in the 3'-noncoding 
regions of plant genes indicates a preference for the hexanucleotlde, 
AATAAA (J. Messing it_aK (1983) In Genetic Engineering of Plants , ed. : 
A. Hollaender, pp. 211-227), however, variations have been observed for 
5 plant genes, e.g. AATAAG and GATAAA. Many pT1 T-DNA ORFs are also 

followed by AATAAA sequences. The remaining pTi T-DNA ORFs are followed 
by polyadenylatlon sites which vary only slightly, e.g. AATAAT, TAT AAA, or 
AATGAA; AATAAT 1s known to function for the ocs gene (H. DeGreve et al. 

(1982) J. Mol. Appl. Genet. 499-511). 

10 Presumptive pRi T L -DNA polyadenylatlon sites and their locations are 

listed in Table 3. Ten ORFs (2, 4, 6, 8, 9, 11, 12, 13, 14, and 15) had 
the consensus hexanucleotide, AATAAA, near their 3 f -ends, whereas only two 
(ORFs 7 and 18) did not contain any related sequence (Table 3, Fig. 2). 
The remaining ORFs (1, 3, 10, and 16) had polyadenylation sites closely 

15 related to those described above. ORFs 8, 10, 12, 13, and 14 were 

followed by multiple polyadenylation signals. Multiple polyadenylation 
sites have also been observed in several pTi T-DNA genes (P. Dhaese et al. 

(1983) EMBO J. 1:419-426; Gielen et , supra ). 

20 1.8 ORF locations with respect to base composition 

The G+C content of the large Agrobacterium plasmids is about 59t 
(S. Sheikholeslam et _al_. (1979) Phytopathol. 69^:54-58). In contrast, 
pRi T L -DNA had very A+T-rich regions flanking the eukaryotic ORFs while 
coding regions had G+C contents in the range of 50%. Plant genes can also 

25 have A+T-rich flanking sequences. 

1,9 Detection of transcripts 

The T L -DNA -left junction with plant DNA found in an A. rhizogenes 
transformed tobacco tissue, clone 9, was between the position 6361 Hind lH 

30 ■ site and the position 7535 EcoR I site, while the right border was to the 
right of the position 19,918 Kpn l site (see Example 1.3). Hybridization 
of nick-translated pRi T L -DNA probes to membrane filter-bound replicas of 
the gels ("Northern blots") clearly showed transcripts carrying ORFs 8 and 
13. An observed transcript of about 950 nucleotides which hybridized with 

35 pRi T L -DNA between EcoR I sites at positions 9077 and 13,445 was assigned 
to ORF 11. An observed transcript of about 1400 nucleotides which hybri- 
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dlzed with sequences spanning the position 17,059 EcoR I site was assigned 
to ORF 15. An observed transcript of about 800 nucleotides which hybri- 
dized with pR1 T L -DNA between the positions 9077 and 13,445 EcoRI sites 
was assigned to ORF 12. 

5 The relative abundances of pR1 T L -DNA transcripts 1n clone 9-derlved 

plants were observed to be a function of organ (leaves vs. roots) and 
phenotype (T vs. T'; see Tepfer (1984) supra ). With the exception of the 
transcript corresponding to ORF 12, pR1 T^-DNA transcripts were more pre- 
valent 1n roots than in leaves, with a particularly striking case being 

10 the mRNA assigned to ORF 15. Expression of the transcript assigned to 
ORF 12 was leaf specific and was correlated with the T' phenotype. 

RNA from JC. arvensis tissue transformed by pRi Tj_-DNA which Included 
sequences encoding ORFs 1-6 also hybridized with pRi T L -DNA. 

15 1 .10 Cone! usions 

The data discussed above (Examples 1.2, 1.4-1.8) indicated that of 
the ORFs flanked by eukaryotic transcription controlling sequences 
(ORFs 1, 2, 3, 4, 5, 6, 8, 9, 10, 11. 12, 13, 14, 15, 16, 17), ORFs 1, 2, 
3, 6, 8, 11, 12, 13, 14, 15, and 16 were most likely to be transcribed. 

20 In tobacco tissue transformed by DNA encoding ORFs 8-18, transcription of 
DNA region encoding ORFs 8, 11, 12, 13, and 15 has been detected 
(Example 1.9). 

Example 2 

25 This Example discloses materials and methods used to obtain the 

results disclosed in Example 1. 

2.1 . Materials 

Restriction endonucleases Ava l , BamH I , Bgl II , EcoR I , Hind lll, Kpn l , 
30 ' Pst I , Pvu II, Sai l , Stu I , Xba l , and Xho l were obtained from Promega- 
Bi otec . Enzymes Acc I , Cla l , Dra l , Mst I , Mst I I , Nar l , Nco l , Xmn I , and 
Xor ll were obtained from New England Biolabs. Polynucleotide kinase was 
from p-L Biochemicals and bovine alkaline phosphatase was from Boehringer- 
Mannheim. [y- 32 P] ATP (2000-3000 Ci/mmole) was obtained from New England 
35 Nuclear, Chemicals used for DNA sequencing were obtained from the vendors 
recommended by A. M. Haxam and W, Gilbert (1980) Meth. Enzymol. 65:499- 
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560. X-ray film on rolls (20 cm x 25 m) XAR-351 was obtained from 
Kodak. DuPont Quanta III Intensifying screens (35 cm x 1 m) were cut 1n 
half to fit sequencing gels (17.5 cm x 1 m). ONA sequencing gel stands, 
designed for gels measuring 20 cm x 104 cm, and safety cabinets were from 
Fotodyne Inc., New Berlin, Wisconsin. Mater jacket thermostatlng plates 
were constructed using V4 inch thick plate glass glued together by 100% 
silicone rubber. 

2.2 DNA isolation 

Procedures for the Isolation and mapping of plasmld and cosmid sub- 
clones of the closely-related Ri plasmids pRiA4 and pRIHRI have been 
published: A4 subclones: EcoR I e36 (EcoR I 3a), BamH I 8a, el6 (contains 
R1 EcoR I fragments 15, 36, and 37a) by F. Leach (1983) Ph.D. Thesis, 
Universite de Paris-Sud, Centre d'Orsay; and pRiHRI subclones: plJ40 
(i.e. cosmid 40) and EcoRI 3b by I. Jouanin (1984) Plasmid 12:81-102. 
Plasmid DNAs were prepared as described by H. C. Bi'rnboim and 0. Doly 
(1979) Nucl. Acids Res. J; 1513 " 1523 * followed by two CsCl, ethidium 
bromide gradient bandings. 

2.3 DNA sequencing 

DNA sequences were determined using the chemical method, essentially 
as described by Maxam and Gilbert, supra . Generally, 10-20 wg of plasmid 
DNA was digested with the appropriate restriction enzyme, followed by 
removal of the 5 1 terminal phosphate with 2-3 units of calf intestinal 
alkaline phosphatase. Reactions were done in 100 rrW Tris pH 8.4, 55°C for 
30 min. Both restriction enzyme and phosphatase were removed by two 
phenol and one chloroform extractions. DNA samples were then precipitated 
with ethanol, desalted with 70% ethanol, dried, and then resuspended in 
15 yl denaturation buffer (50 mM Tris-HCl (pH 9.5), 5 mM spermidine, and 
0.5 mM EDTA) and 15 pi H 2 0. End-labeling with [y- 32 P]ATP and isolation of 
end-labeled fragments were as described by Maxam and Gilbert, supra . Care 
was taken to avoid sequencing errors resulting from the presence of hydra- 
zine-unreacti ve 5-methycytosine bases, found after growth in E_. col i at 
the second cytosine base of EcoR I I or BstN I restriction enzyme sites 
(J. L. Slightom et _al_. (1980) Cell jU:627-638). 
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Conditions for chemical reactions, at 20°C, were as follows: 1 p l 
dimethyl sulfate for G, 30 sec*; 30 yl of formic acid 95X for A, 2.5 m1n.; 
30 vl of hydrazine 95% for C+T and C, 2.5 min. DNA samples were electro- 
phoresed 14 hours, at 2500 V at constant voltage on gels 20 cm wide, 

5 104 cm long and 0.2 mm thick. Constant gel temperatures (50°C) were main- 
talned using a water-jacketed plate on one side of the gel sandwich. The 
opposite plate of the sandwich was treated with y-methacryloxypropyl- 
trimethoxy silane (Sigma 6514) as described by H. Garoff and W. Ansorge 
(1980) Analyt. Biochem. 115:450-457, to bind the acrylamide chemically to 

0 the glass. Gel pouring, loading, and autoradiography have been described 
by R. F. Barker et&}_. (1983) Plant Hoi , Biol. 2/.335-350, and J. L. 
Slightom et _al_. (1983) Proc. Natl. Acad. Sc1. USA 80: 1897-1901 . 

Computer programs for DNA sequence analysis were supplied by the 
University of Wisconsin Genetics Computer Group. 

5 

Example 3 

This Example teaches the manipulation of pRi T^-ONA TxCSs prepara- 
tory to insertion of a foreign, structural gene. 



20 3.1 Removal of Ndel sites from an K13-based vector 

These Examples extensively use ol igonucleotide-directed, site- 
specific mutageneiss of DNA (see Example 5.2). Although individuals 
skilled in the art may choose to use double-stranded DNA methods for such 
mutagenesis, as exemplified herein single-stranded methods are used. In 

25 general, single-stranded methods utilize M13-based vectors having inserted 
_E. col 1 lac gene sequences. Wild-type M13 contains three Nde l sites while 
the lac sequences contain no Nde l site; BamH I sites are absent from both 
M13 and lac . Removal of these Nde l sites, described below, by site- 
specific mutagenesis may prove essential when replacing a T-DNA structural 

30* gene with a heterologous foreign structural gene (Example 6.1). M13-based 
vectors include mWB2341 and related vectors (W. M. Barnes jet _aK (1983) 
Meth. Enzymol. 101 :98-122; W. M. Barnes and M. Bevan (1983) Nucl. Acids 
Res. _U:349-368), and the M13mp-series of vectors (e.g. see J. Norrander 
£t aK (1983) Gene 26:101-106, 0. Messing and J. Vieira (1982) Gene 

35 11:269-276). mWB2341 and related vectors are linearized by digestion with 
EcoR I and Hind i 1 1 and the resultant sticky-ends are converted to blunt- 
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ends by Incubation with the Klenow fragment of E> coll DNA polymerase I, 
Most of the M13mp-ser1es vectors can be linearized by at least one blunt- 
end-forming restriction endonuclease (e.g. Sma l or Hind i)* In the alter- 
native, particular single-stranded DNA vectors may be preferred for some 
5 operations; other vectors may be substituted for those referred to above 
with minor modification of procedures described herein/ars will be under- 
stood by those of ordinary skill in the art. Also in the alternative, 
double-stranded DNA vectors might be substituted (see references cited 1n 
Example 5.2). 

* c Single-stranded DNA (ssDNA) of the viral form of an M13-based vector 

is isolated and subjected to oligonucleotide-directed site-specific muta- 
genesis, described in detail in Examples 3.3 and 5, after hybridization to 
5 , CAATAGAAAATTCATA < GGGTTTACC3 I f 5'CCTGTTTAGTATCATAGCGTTATAC3\ and 
5'CATGTCAATCATTTGTACCCCGGTTG3' , thereby removing three Ndel sites which 

1= will later prove to be inconvenient without changing the transl ational 
properties of the encoded proteins. A mutated M13-based vector lacking 
three Nde l sites is identified and designated ml3(Nde). 

3.2 Subcloning pRi Tj_-DNA into an M13-based vector 

20 DNA of a plasmid listed in Table 4, column 1 (e.g. pLJ40 for manipu- 

lations of the ORFs 11, 12, and 13 promoters and polyadenyl ation sites) 
(see Example 2.2 for the sources of these plasmids) is isolated and diges- 
ted to completion with the restriction enzyme(s) listed in Table 4, 
column 2 (e.g. Sma l and Mstll for ORFs 11, 12, and 13). ONAs of e36 and 

25 pLJ40 are respectively harbored by the deposited strains NRRL B-15958 and 
NRRL B-15957. (Alternatively, pRiA4 DNA or pRiHRI DNA may be isolated and 
digested with the enzyme(s) listed in Table 4, column 2.) 5 1 or 3 ! -pro- 
truding-ends are. then converted to blunt-ends by incubation with the 
Klenow fragment of E_. col i DNA polymerase 1 or T4 DNA polymerase, respec- 

30 tively, and all four deoxynucleotide triphosphates. The resulting mixture 
of DNA fragments separated by agarose gel electrophoresis and a fragment 
* whose size is listed in Table 4, column 3 (e.g. 5.2 kbp for ORFs 11, 12, 
and 13) is eluted from the gel. 

Covalently-closed-circular DNA (cccDNA) of the replicative form (RF) 

35 of the Ki3-based vector ml3(Nde) is isolated, converted to a linear, 
blunt-ended DMA, and has its 5* -phosphates removed by incubation with 
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phosphatase. The resulting linearized vector 1s purified by gel electro- 
phoresis and 1s mixed with end Ugated to the T-ONA fragment isolated 
above. After transformation of the resulting mixture Into E, coll t viral 
DNAs and RFs are Isolated from transformants and screened by restriction 
5 and hybridization analysis for the presence of inserts which when 1n 

single-stranded viral form, are complementary to the sequence as presented 
1n Fig. 1 and which carry the complete DNA sequence of ORFs listed in 
Table 4, column 4. The virus which Infects the selected colony 1s desig- 
nated as listed in Table 4, column 5 (e.g. mR4 for ORFs 11, .12, and 13). 

LO 

3.3 Removal of endogenous Ndel and BamHI sites from pRi T L -DNA 

A vector designated as listed in Table 5, column 1 (e.g. mR4' for 
manipulations of the ORFs 11, 12, and 13 promoters and polyadenylation 
sites) is prepared from the vector listed in the corresponding line of 

15 Table 5, column 2 (e.g. mR4 for ORFs 11, 12, and 13) by primer extension 
after hybridization to the oligonucleotides listed in Table 5, column 3 
(e.g. 5 , GATTAGATAGTCA_GATGAGCATGTGC3\ 5'GCAAATCGGAGCCCCTCGAATAGG3' , 
B'GCAATTTGGGAGCCATTGTGATGTGAGS 1 , and 5'CGGTTACGCGGAGCCTATGCGGAGCGCC3' for 
ORFs 11, 12, and 13). This operation removes indigenous BamH I sites and 
Nde l sites, the sites designated 1n Table 5, column 4 being at pRi T^-DNA 
positions listed in column 5 (e.g. for ORFs 11, 12, and 13, an Ndel site 
at position 10,305 and BamH I sites at positions 11,198, 11,278, and 
12,816), which may be present which may prove inconvenient in later 
manipulations. (Note that there are no BamH I or Nde l sites in mR5.) The 

25 sites may be removed one at a time by hybridization of a particular oligo- 
nucleotide to the ssDNA viral form of the vector listed in Table 5, 
column 2 (e.g. mR4 for ORFs 11, 12, and 13), incubation of the 
primer/viral DNA complex with the Klenow fragment of _E. col 1 DNA poly- 
merase I, all four deoxynucleotide triphosphates, and DNA ligase, enrich- 

30 ' ment of resulting cccDNA molecules, transformation into_E. coli selection 
• of transformants, and isolation of RF followed by restriction enzyme 
analysis to identify a clone missing the undesired restriction sites. 
These steps are repeated for each site which is to be removed. Alterna- 
tively, the vector listed in Table 5, column 2 (e.g. mR4 for ORFs 11, 12, 

35 and 13) may be simultaneously hybridized to all of the oligonucleotides 
listed in Table 5, column 3 and then carried through the mutagenesis pro- 
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cedure thereby attempting, the procedure not being 100X efficient, to 
eliminate all of the sites 1n a single operation. 

3.4 Placement of novel Ndel and BamHI sites 1n pR1 T L -DNA 
5 A vector designated as listed 1n Table 6, column 1 (e.g. mORF 11 for 

manipulations of the ORF 11 promoter and polyadenylatlon site) 1s prepared 
from the vector listed in the corresponding line of Table 5, column 2 
(e.g. mR4* for ORF- 11) by primer extentlon after hybridization to the 
oligonucleotides listed in Table 6, column 3 (e.g. 

. C 5 1 GCTGCGAAGGGATCCCTTTGTCGCC3 ' , 5 1 CGCAAGCTACAACATCATAJTGGGGCGG3 ' , 

5'GGGATCCATATGTGATGTGAGTTGG3' , 5 1 GCCTAAGAAGGAATGGTGGATCCATGTACGTGC3 ' for 
ORF 11) as described above and in Example 5. This has the effect of 
introducing Nde l sites (5 1 . . .CATATG. . .3' ) at the translational start site 
(ATG) and near the translational stop site (TAA, TGA, or TAG), and of 

15 introducing BamH I sites (5* . . .GGATCC . .3* ) in the sequences flanking the 
T-DNA gene, usually approximately 0.3 kbp from the transcriptional start 
and polyadenylation sites. The first and fourth oligonucleotide of each 
quartet listed in Table 6, column 3 introduces BamH I sites while the 
second and thirds introduce Ndel sites. These sites are located in the 

A ° corresponding pRi T^-DNA at the approximate position listed in Table 6, 
column 4. For example, for manipulation of ORF 11, 

5 1 GCTGCGAAGGGATCCCTTTGTCGCC3 ' and ' 5 ' GCCTAAGAAGGAATGGTGGATCCATGTACGTGC3 1 
introduce BamH I sites and position 9,974 and 12,001, respectively, while 
5 1 CGCAAGCTACAACATCATAT_GGGGCGG3 ' and 5 1 GGGATCCATATGTG ATGTGAGTTGG3 1 Intro- 
duce Mel sites at positions 10,679 and 11,286, respectively. The size 
and locations of the TxCS-carrying DNA segments used in these Examples may 
be calculated from the positions listed in Table 6, column 4 and the 
orientations defined in Table 2 and Fig. 1. Positions listed in Table 6, 
column 4, of pairs of Nde l and BamH I sites define promoter-bearing (?) and 

- D polyadenylation site-bearing (A) DNA segments as indicated by "P"s and 

"A"s , respectively, in column 5, the segments having approximate sizes as 
indicated in column 6. For example, the ORF 11 promoter ison an approxi- 
mately 715 bp DNA segment located between artificial Nde l and BamH I sites 
at approximate positions 11,286 and 12,001, respectively, while the ORF 11 

35* polyadenylation sites is on an approximately 705 bp DNA segment located 

between artificial Bam HI and Nde l sites at approximate positions 9,974 and 
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10,679, respectively. Note that mORF12-13 and rr>0RF16-17 provide examples 
of combinations of a promoter and a polyadenylatlon site from two 
different T-DNA genes. 

5 Example 4 

This Example teaches the manipulation of four exemplary foreign 
structural genes preparatory for insertion into a pRi Tj_-DNA TxCS. The 
genes are for the proteins phaseolin (a nutritionally important seed 
storage protein from phaseolus vulgaris) , _P. vul garis lectin (a 

10 nutritionally important protein found in seeds and other plant tissues 
which may be Involved in symbiotic nitrogen fixation and making seeds 
unpalitable to herbivores), thaumatin (a protein which tastes sweet to 
primates, naturally found in Thaumatococcus daniellii ). and crystal 
protein (a protein produced by Bacillus thuringiensis which is used 

15 commercially to control larval pests 'of a large number of lepidopteran 
insect species). The crystal protein structural gene used here, though 
lacking its 3* end, encodes a protein toxic to insect larvae. Phaseolin, 
lectin, and thaumatin are eukaryotic genes; crystal protein is prokary- 
otic. Phaseolin contains introns; lectin and crystal protein do not. The 

20 lectin gene itself contains no introns and could be obtained on a 5.7 kbp 
Hindi II fragment from a genomic clone (L. M. Hoffman (1984) J. Mol. Appl. 
Genet. _2:447-453) which is part of a plasmid harbored by the deposited 
'strain KRRL B-15621 (see also Example 6-4). However, in this Example the 
lectin structural gene is obtained from a cDHA clone (L. M. Hoffman et al . 

25 (1982) Nucl. Acids Res. 10:7819-7828), as is the thaumatin gene. 

4.1 Subcloning structural genes into Ml 3 

The genes listed in Table 7, column 1 are carried by the plasmids 
listed in Table 7, column 2, which may be isolated from the deposited 

30 stains listed in Table 7, column 3 (e.g. the crystal protein structural 
gene is carried by pl23/5S-10 which is harbored within KRRL B-15612). DMA 
of a plasmid listed in Table 7, column 2 is digested to completion with 
the restriction enzyme(s) listed in the corresponding row of Table 6, 
column 4 and protruding ends are removed by incubation with the enzyme 

35 listed in Table 6, column 5 (e.g. for manipulation of the crystal protein 
structural gene, pl23/53-10 DNA is digested with Hindlll and the resulting 
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sticky-ends are removed by incubation with the Klenow fragment of _E. col 1 
DNA polymerase lj, A DNA fragment whose size is listed 1n Table 7, 
column 6 (e.g. 6.6 kbp for the crystal protein) 1s isolated by elution 
from an agarose gel after el ectrophoretic separation. The resulting frag- 
5 rr.ent is mixed with and ligated to dephosphorylated, blunt-ended, 

linearized ml3(Nde), prepared as described in Example 3.1, and is trans- 
formed into E. coli . Viral DNAs and RFs are Isolated from transfonr.ants 
and screened by restriction and hybridization analyses for the presence of 
inserts which are complementary to the sequence when in single-stranded 
JO viral form as present in the mRKA. The vector which infects the selected 
colony is designated as listed in Table 7, column 7 (e.g. m3tCP for the 
crystal protein). 

4.2 Placement of Ndel sites flanking three structural genes 

15 DNA of a vector listed in Table. 8, column 1 is used to prepare a 

vector designated as listed in Table 8, column 2 by primer extension after 
hybridization to the oligonucleotides listed in Table 8, column 3 (e.g. 
for crystal protein, m3tCP is used to make mBtCP 1 by extending the primers 
5 ' GGAGGTAACATATGGATAACAATCCG3 ' and 5 * GCGGCAGATTAACGTGTTCATATGCATTCGAG3 1 ) 

21 as described in Examples 3.3 and 5. This has the effect of introducing 

Ndel sites at the translational start site and near the translational stop 
site; there are no BamH I or Nde l sites present within the structural gene 
which might otherwise be removed. In the case of the B. thurinoiensi s 
crystal protein gene, a translational stop codon (TAA) is additionally 

23 introduced. The structural genes listed in Table 7, column 1 may be iso- 
lated as a DNA fragment whose size is listed in Table 8, column 4 after 
digesting DNA of a vector listed in the corresponding line of Table 8, 
column 2 to completion with Nde l (e.g. the crystal protein structural gene 
is isolated from mS'tCP' on a 2.8 kbp Ndel fragment). 

31 

4.3 Kjtacer.esis of thaumatin 

Theumatin cDNA-contai ni ng vectors have been disclosed by 
C-.- T. Verrips el il- , Eur. Pet. applications 54,330 and 54,331, and 
L. Edens jt. (1962) Gene _18:1-12. Thaumatin is originally synthesized 
3) cs preprothaumatin, the prefix "pre" representing the presence of a 

"signal peptide" having the function of causing the export of thau-atin 
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from the cytoplasm into the endoplasmic reticulum of the cell in which 1t 
is being synthesized, and the prefix "pro" representing that the protein 
is not in mature form. A thaumatin cDNA structural gene is present as the 
complement to thaumatin mRNA in M13-101-B (Eur. Pat. application 
5 54.331 ). The viral form of this vector is used as a source of a thaumatin 
structural gene after site-specific mutagenesis directed by two of the 
following oligonucleotides: (a) S'GGCATCATACATCAJATGGCCGCCACCS* , 

(b) 5 ' CCTCAC6CTCTCCCGCG CATATGG CCACCTTCGAGATCGTCAACCGC3 ' , 

(c) 5 ' CGAGTAAGAGGATGAAGACGGACATATGAGGATACGC3 1 . or 

10 (d) 5 1 GGGTCACTTTCTGCCCTACT6CCTAACATATCAAGACGACTAAGAGG3 1 . When mutated by 
oligonucleotides (a) and (c), which bind to the 5'- and 3'-ends of the 
structural gene, respectively, a preprothaumatin sequence is extracted 
from the resultant vector by Nde l digestion. When mutated by oligonucleo- 
tides (b) end (d), which bind to the 5'- end 3'-ends, respectively, a 

15 mature thaumatin sequence is similarly extracted. Use of the combinations 
of (a) with (d) and (b) with (c) yields fragments encoding what might be 
termed prethaumetin and protheumatin, respectively. All of these 
sequences are obtained on fragments having a size of approximately 0.7 kbp 
having no internal Nde l or BamH I sites which may be isolated as usual by 

20 gel electrophoresis. 

4.4 Other possible manipulations 

Phaseolin and lectin, as initially translated have signal peptides 
at their ami no-termi ni , as is the case with thaumatin. If desired, these 

25 signal peptides may be eliminated by placing the S' -Nde l site between the 
codons forming the junction between the signal peptide and the mature 
protein. When under control of a T-DNA in a plant cell nucleus, such a 
structural gene will cause the synthesis of a phaseolin or lectin protein 
which is not exported from the cell's cytoplasm. Sequences useful for 

30 designing oligonucleotides for manipulating for phaseolin and lectin 

structural genes are respectively reported by J. L. Si ightom et (1983) 
Proc. Natl. Acad. Sci • USA 80:1897-1931 , and Hoffman et jeK (1982) supra . 
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Example 5 

This Example describes techniques for the synthesis and use of syn- 
thetic oligonucleotides. Other useful references can be found in the list 
of works cited in the section introductory to these Examples. 

5 

5.1 Oligonucleotide synthesis 

Techniques for chemical synthesis of DNA utilize a number of tech- 
niques well known to those skilled in the art of DNA synthesis. Modifica- 
tion of nucleosides is described by H. Schaller et ah (1953) J. Amer. 

n Chem. Soc. £5:3821-3827 , and H. Buchi and H. G. Khorana (1972) J. Mol. 
Biol. 72:251-288. Preparation of deoxynucl eoside phosphoramidites is 
described by S. L. Beaucage and M. H. Caruthers (1981) Tetrahedron Lett. 
22:1859-1862. Preparation of solid phase resin is described by S. P. 
Adams et £h (1983) J. Amer. Chem. Soc. H)5;661-663. Hybridization pro- ■ 

15 cedures useful during the formation of double-stranded molecules are 
described by J. J. Rossi et _ah (1982) J. Biol. Chem. 257_:9226-9229. 

5.2 Qligonucleotide-directed site-specific mutagenesis 

General methods of directed mutagenesis have been reviewed by 

2D D. Shortle et ah (1981) Ann. Rev. Genet. 21:265-294. Of special utility 

in manipulation of genes is oligonucleotide-directed site-specific muta- 
genesis, reviewed recently by C. S. Craik (1985) Biotechniques 1:12-19; 
M. J. Zoller and H. Smith (1983) Meth. Enzymoh 100:468-500; M. Smith and 
S. Gill am (1981) in Genetic Engineering; Principals and Kethods , Vol. _3, 

25 eds.: J. K. Setlow and A. Hollaender; and M. Smith (1982) Trends in 

Biochem. 7:440-442. This technique permits the change of one or more base 
pairs in a DNA sequence or the introduction of small insertions or dele- 
tions. Recent examples of oligonucleotide-directed mutagenesis include 
W. Kramer et .ah (1984) Nuch Acids Res. V2 :9441-9456 ; Zo'ller and Smith 

30 (1983) supra ; K. J. Zoller and K. Smith (1982) Nucleic Acids Res. _10:6487- 

6500; G. Dalbadie-KcFarland et ah (1982) Proc. Nath Acad. Sci. USA 
7£:6409-6413; G. F. K. Simons et_ah (1982) Nucleic Acids Res. 10:821-832; 
and C A. Hutchison HI rt. ah (1978) J. Biol. Chem. 253:6551-5560. 
Oligonucleotide-directed mutation using double-stranded DNA vectors is 

35 also possible (R. B. Wallace et ah (1980) Science 209 :1396-1400; G. P. 

Vlasuk et ah (1983) J. Biol. Chem. 258:7141-7148; E. D. Lewis et .ah 
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(1983) Proc. Natl. Acad, Sc1. USA 80:7065-7069; Y. Morinaga et aK (1984) 
Blotechnol. £:636-639). See Example 3.1 for useful M13-based vectors. 

Example 6 

This Example teaches use of the pRi T^-DNA TxCSs and the foreign 
structural genes manipulated in Example 3 and 4, respectively. Specific 
Examples of plant transformation vectors, plant transformation, and plant 
regeneration are given below in Examples 6.4-6.7. 

0 6.1 Assembly of TxCS/structural gene combinations 

A plasmid listed in Table 6, column 1 (e.g. mORF 11) is digested 
with Ndel and dephosphorylated with phosphatase, and the opened vector may 
be separated from the T-DNA structural gene found nested within the 
TxCS. A plasmid listed in Table 8, column 2 is digested with Nde l and the 

.5 corresponding structural gene listed in Table 7, column 1 is isolated as a 
fragment whose size is listed in Table 8, column 4 by agarose gel electro- 
phoresis followed by elution from the gel (e.g. crystal protein structural 
gene is isolated from mBtCP* on a 2.8 kbp Ndel fragment),. Additionally, a 
thaumatin-encoding fragment may be isolated as described in Example 4.3. 

JO Any desired combination of an opened TxCS vector and an isolated foreign 
structural gene may now be mixed with each other and li gated together* 
For example, crystal protein structural gene may be placed between an 
" ORF 11 promoter and an ORF 11 polyadenyl ation site, thereby replacing the 
structural gene of ORF 11 with that of the crystal protein, by lioating 

25 the 2.8 kbp Ndel fragment of m3tCP' into Ndel-digested mDRF 11 DMA. The 
ligation mixtures are individually transformed into £. coli and RFs are 
isolated from the resultant transformants and characterized by restriction 
analysis. A colony is chosen for each transformation which lacks the 
endogenous pRi T L -DNA structural gene and has a single copy of the hetero- 

30 logous foreign structural gene inserted within the TxCS, the structural 
* gene and the TxCS being in such orientation with respect to each other 
that the gene is expressible under control of the TxCS when within a plant 
cell. 
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5.2 Assembly of plant transformation vectors 

A TxCS/heterologous foreign structural gene combination may be 
removed from the M13-based vector constructed in Example 6.1 by digestion 
with BamHI followed by agarose gel electrophoresis and elutlon. The size 

5 of the BamHI -fragment bearing the promoter/structural gen/polyadenylat1on 
site may be calculated by adding the size of the structural gene-bearing 
fragment, as listed in Table 8, column 4, to the sizes of the promoter and 
polyadenylation site-bearing segments, as listed in Table 6 t column 6. 
For example, an ORF 11 TxCS/crystal protein structural gene combination, 

10 as exemplified herein, may be obtained on a 4.2 kbp BamH I fragment 
(2.8 kbp + 715 bp + 705 bp). A TxCS/gene combination may be inserted 
directly into a 5'GATC...3' sticky-ended site, which may be generated by 
Ban". I, Bell, Bglll, Kbo l t or Sau3AI. Alternatively, the combination may 
be inserted into any desired restriction site by conversion of sticky-ends 

i5 into blunt-ends followed by blunt-end ligation or by use of appropriate 
oligonucleotide linkers. 

An alternative to assembly of a pRi T L -DKA TxCS/structural gene 
combination followed by insertion of that combination into a plant trans- 
formation vector is the insertion of a pRi TxCS into a plant transforma- 

20 tion vector followed by insertion of the structural gene into the 

TxCS/transformation vector combination. It is advantageous that the* plant 
transformation vector not contain Hdel sites if the particular manipula- 
tion strategy exemplified herein is to be used. Otherwise TxCS/vector 
combination may be linearized by partial Hde l digestion, as will be under- 

25 stood in the art. 

6.3 Vector choice, transformation and plant regeneration 

The plant transformation vector into which the TxCS/gene combination 
is to be inserted may be a TIP-based system such as a TIP plesmid, a 

30 shuttle vector for introduction of novel DMAs into TIP plasmids, or a sub-.. 
TIP plasmid, e.g. mini-Ti or micro-Ti. Alternatively, a vector based upon 
a DNA virus, mini chromosome, transposon, and homologous or nonhomologous 
recombination into plant chromosomes may be utilized. Any mode of 
delivery into the plant cell which is to be initially transformed may be 

35 used which is appropriate to the particular plant transformation vector 
into which the TxCS/structural gene combination is inserted. These forms 
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of delivery Include transfer from a Agrobacterium cell . fusion with 
vector-containing liposomes or bacterial spheroplasts , direct uptake of 
nucleic acid, encapsidation 1n viral coat protein followed by an infec- 
tion-like process, or microinjection. 
5 The initially transformed plant cells are propagated and used to 

produce plant tissue and whole plants by any means known to the art which 
is appropriate for the plant transformation vector and delivery mode being 
used. Methods appropriate for TIP-based transformation systems include 
those described by M.-D. Chilton et al_. (1982) Nature 295:432-434. for 

10 carrots, K. A. Barton et_ jK (1983) Cell _32:1033-1043, for tobacco. 

Selection of transformed cells may be done with the drugs and selectable 
markers as described in the Background. The exact drug, concentration, 
plant tissue, plant species and cultivar must be carefully matched and 
chosen for ability to regenerate and efficient selection. Screening of 

15 transformed tissues for tissues expressing the foreign structural gene may 
be done using immunoassays known to the art. Southern, northern, and dot 
blots, all methods well known to those skilled in the art of molecular 
biology, may be used to detect incorporated or expressed nucleic acids. 
Screening for opine production is also often useful. 

20 

6,4 Preparation of a disarmed T-DN'A vector 

I- co1i C600 (pRK-203-Kah-103-Lec), which is on deposit as NRRL 
B- 1 5621 , is a pRK290 derivative containing T-DNA sequences of pTil5955 
from between EcoRI sites at positions 4,494 and 12,823, as defined by 

25 R. F. Barker et il- (1983) Plant Mol. Biol. 2;335-350, except for a dele- 
tion of sequences between position 5,512 Hindi II site and position 9,062 
BamHI site. Inserted into the deletion, i.e. substituting for the deleted 
T-DNA, is a Tn5-derived kanarnycin resistance (kan ) gene and a Phaseolus 
vulgaris seed lectin gene (see Example 4, Hoffman (1984) supra .). The 

30 lectin gene may be deleted from pSK-203-Kon-103-Lec by digestion with 
Hindi 1 1 followed by relocation; the resultant vector is designated 
pRK-203-Kan-103. BamMI-digested. dephosphorylated pRK-203-Kan-103 is 
mixed with and li gated to a BamH I fragment bearing the pRi T L -DNA 
TxCS/heterologous foreign structural gene combination assembled in 

35 Example 6.2; the resultant vector is designated pRK-203-Ri-Kan-103. 

pRK-203-Ri-Kan-103 is introduced in_A. tumefaciens ATCC15955 using methods 
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well known in the art, and a double-homologous recombinant, designated 
RS-Ri-Kan, is identified, RS-R1-Kan does not harbor pRK-203-Ri-Kan-103, 
but contains a mutated pTil5955 having a T-DNA substitution between the 
positions 5,512 Hindlll site and 9,062 and BamHi site of a TxCS/structural 
5 gene combination and a k£n gene for pTi T-DNA. This substitution deletes 
some tmr and tms_ sequences, thereby disarming the T-DHA. RS-Ri-Kan T-DKA 
transforms inoculated plant tissue without conferring the phenotype of 
hormone-independent growth. Tobacco tissues transformed by RS-Ri-Kan may 
be regenerated into normal plants using protocols well known in the art 
10 f or regeneration of untransf ormed tissue. 

6.5 Construction of a micro-Ti plasmid 

pl02, a pER322 clone of the pTil5955 T-DNA fragment between Hindi II 
sites at positions 602 and 3,390 (as defined by R. F. Barker eta^., supra 

15 carries the left border of T L and promoter sequences associated with 

ORF 1. p233 is a pBR322 clone of the pTil5955 T-DNA BaraHI/EcoRI fragment 
spanning positions 9,062 and 16,202. The T-DNA of p233. includes a 
Smal/Bcll fragment spanning positions 11 ,207 and 14,711, having _0£1» a 
3'-deleted tml , and the right border of T L . p233 was linearized with 

20 Sma l, mixed with and ligated to a cosmerci'ally available blunt-end BoTJI 
linker, trimmed with Bglll, religated to itself, and transformed into 
E. coli 6K33 (a dam" host that does not methyl ate DNA in a manner incom- 
* 7atible with the action of Bell, M. G. Marinus and N. R. Morris (1974) 
J. Mol. Biol. 85/.309-322). A colony was identified which harbored a plas- 

25 mid, designated p233G, having a Bglll site in the location formerly 

occupied by the position 11,207 Sma I site. p233G DMA was digested with 
Bgl n and Eel I and a 3.5 kbp fragment was isolated by agarose gel electro- 
phoresis followed by elution. The 3.5 kbp Bglll/Bcll fragment was mixed 
•with end ligated to Bel Il-digested, phosphatase-treated pl02 DNA. The 

30 ligation mixture was transformed into £. coli K802 (W. B. Wood (1966) 

J. Kol. Biol. J_6 :1 18) - Plasmid DNAs from ampicill in-resistant transfor- 
mants were characterized by restriction analysis and a colony was identi- 
fied, designated pAK-4, having the Bglll/Bcll fragment of p233G inserted 
into the Belli site of pl02 and oriented so that the ocs_ gene was located 

3i> between the left and right T L borders. One Bglll site, also between the 
borders, was regenerated, and a Bglll/Bcll suture, not suscepteble to the 
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ection of either enzyme, was generate to the right of the right border. 
pAK-4 may be represented as follows: 

pBR322...HindIII...left border. ..Bgl II. ..ocs. ..right border... 

(BglH/BclI )-..H1ndIII...pBR322 

5 

The T-DNA of pAK-4 may be removed on a 6 kbp Hindi 1 1 fragment. 
HJ_ncil 1 1 -digested pAK-4 DMA was mixed with and ligated to Hind lH- 
linearized, phosphatase-treated pSUPlOS DNA. pSUP106, a 10 kbp wide host- 
range plasmid capable of maintenance in both _E_. col i and Agrobacterium 
10 (R. Simon _et_a}_. (1983) in Molecular Genetics of the Bacteri a-Pl a.nt 'Inter- 
action , ed.: A. Puhler, pp. 98-105), is harbored by JE. col i CSH52 
(pS'JP106) which is on deposit as NRRL B-15486. The reaction mixture was 
transformed into K802 and plasmid DNAs from chloramphenicol-resistant 
transforments were characterized by restriction analysis. A colony was 
15 identified harboring a plasmid, designated pAN6, having the Agrobacter ium 
DNA of pAK-4 inserted into the Hindi 1 1 site of pSUP106 oriented so that 
Bglll/Bcl I suture was proximal to the pSUP106 EcoRI site. pAK6 is a 
micro-Ti plasmid having within its two T-DNA borders a functional ocs gene 
and a Bglll site that is unique to the plasmid. The Bgl II site is flanked 
20 by an incomplete tml gene and the pTi ORF 1 promoter, both of which are 
transcribed towards the Bgl II site. 

BamHI-digested, dephosphoryl ated pAK5 is mixed with and ligated to a 
' Bam.nl fragment bearing the pRi T L -DNA TxCS/heterolocous foreign structural 
gene combination assembled in Example 6.2; the resultant vector is desig- 
nated pAN6-Ri . pAN5-Ri may be introduced into an Agrobacterium strain 
having a helper plasmid, e.g. LBA4404 (G. Oorns e^^L (1981) Gene 24:33- 
50), using methods well known in the art. 

6.6 Inoculation of tobacco stems 

Stems of sterile Nicotiana tabacuT. var. Xenthi ere cut into segments 
approximately 1 cm lone. These segments are placed basal end up in Petri 
dishes containing Kurashice and Skooc medium (KS medium: 1.65 g/1 NH 4 N0 3> 
1.9 g/1 KNO3, 440 mg/1 CaCl 2 -2H 2 0, 370 mg/1 KcS0 4 -7H 2 0, 170 mg/1 KH 2 P0 4 , \. 
0.83 mg/1 KI , 6.2 mg/1 H3BO3, 22.3 mg/1 KnS0 4 .4H 2 0, 8.6 mg/1 ZnS0 4 -7H 2 0, 
0.25 mg/1 Na 2 Mo3 4 * 2H 2 0, 0.025 mg/1 CuS0 4 -5H 2 0, 0.025 mg/1 CoCl 2 -6H 2 0, 
37.23 mg/1 Na^DTA, 27.85 mg/1 FeS0 4 .7K 2 0, 1 g/1 inositol, 50 mg/1 



30 



35 
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nicotinic add, 50 mg/1 pyroxidine* HC1 . 50 mg/1 thiamine- HC1 , 30 g/1 
sucrose, and 8 g/1 agar, pH 5.8) without hormonal supplement, a medium 
well known in the art* The basal (upper) ends are then inoculated with 
Aorobacterium cells by puncturing the cut surface of the stem with a 
5 syringe needle. After two weeks of incubation at 28°C with 16 hr light 
and 8 hr dark, call i develop at the upper surface of all stem segments. 
The callus regions are then transferred to MS medium containing 2.0 mg/1 
NAA (1-naphthalene acetic acid), 0.3 mg/1 kinetin and 0.5 mg/ml carbini- 
cillin. After two weeks on this medium, the tissues are free of bacteria 
10 and can be assayed for the presence of opines, a methodology well known in 
the art. 

Once free of inciting bacteria, the transformed plant tissues are 
grown on MS medium with hormones at 25°C with 15 hr light and 8 hr dark. 
These tissues are cloned using a suspension method described by A. N. 

15 Binns and F. Meins (1979) Pi anta _145; 365-369. Briefly, tissues are placed 
in liquid MS medium supplemented with 2.0 mg/1 NAA and 0.1 mg/1 kinetin, 
and shaken at 135 rpm at 28°C for 10-14 days. The resultant suspensions 
are filtered successively through 0.543 and 0.213 mm mesh sieves, concen- 
trated, and plated at a final density of 8 x 10 3 cells/ml in MS medium 

20 supplemented with 2.0 mg/1 NAA and 0.3 mg/1 kinetin. After these grow to 
approximately 100 mg, colonies are split into two pieces. One piece is 
placed on complete MS medium and the other is screened for the presence of 
* opines. Approximately 0-505 of the colonies are found to be opine- 
positive, depending on the particular parental uncloned callus piece from 

25 which the colonies were descended. Uncloned pieces having higher concen- 
trations of opine tended to yield a higher percentage of opine-positi ve 
clones. 



6.7 Regeneration of recombinant plants 

30 Tissues from various opine-positive clones are transferred onto MS 

medium supplemented with 0.3 mg/1 kinetin and cultured at 28°C with 16 hr 
light and 8 hr dark. Shoots initiated are subsequently rooted by placing 
them in MS medium without hormones. Rooted plantlets are transferred to 
soil and placed at high humidity in a greenhouse. After 7-10 days, the 

35 plants are then grown with normal greenhouse conditions. Regenerated 
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plants derived from op1ne-pos1t1ve clones contain opines. The presence of 
opines indicates thereby that these normal looking plants are transformed 
by T-DNA. 
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Table 1 Restriction Enzyme Sites 1n pRI T^-DNA Region 



No. 

Enzyme Sites Locations 
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Bst E II 
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993 
























Sna I 
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459 
























Apa I 
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*s f\ r\ 

390 


17 


851 




















Kst II 
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n f\ f 
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15 


021 


















y o 


Sma I 
Xba I 


2 
2 


3 


075 
676 . 


9 
4 


863 
999 




















Kpn I 


3 


3 


364 


14 


133 


19 


918 
















Mlu I 


3 


17 


606 


20 


793 


20 


856 
















Nco I 


3 


2 


262 


10 


133 


21 


021 














15 


Sst II 


3. 


3 


431 


14 


691 


17 


037 
















Xho I 


3 


9 


242 


11 


003 


20 


700 
















Bam HI 


4 


1 


343 


11 


198 


11 


278 


12 


816 












Hpa I 


4 


8 


375 


12 


459 


13 


700 


18 


818 












Hde I 


4 


3 


519 


3 


861 


4 


822 


10 


308 










20 


Nru I 


4 


5 


281 


10 


968 


11 


617 


18 


901 












Sal I 


4 


4 


515 


6 


047 


•12 


655 


15 


821 












Ave III 


5 


13 


684 


14 


382 


15 


480 


16 


415 


18 


262 








BssH II 


5 


5 


727 


6 


847 


19 


761 


20 


260 


20 


660 








BstX I 


5 


2 


269 


4 


226 


9 


912 


16 


016 


• 18 


309 






25 


Cla 1 


5 




35 




753 


11 


421 


12 


598 


21 


110 








Nar I 


5 




465 


4 


114 


11 


356 


16 


441 


20 


385 








Nsi I 


5 


13 


688 


14 


386 


15 


484 


16 


419 


18 


266 








Sea I 


5 


1 


794 


4 


546 


10 


166 


11 


500 


13" 


858 








Tth III I 


5 


3 


413 


3 


816 


8 


217 


8 


769 


11 


369 






30 


Xma III 


5 


5 


814 


7 


970 


8 


502 


10 


613 


20 


347 








Aat II 


6 




974 


5 


615 


6 


054 


7 


521 


9 


272 


19 


089 




• Asu II 


6 


4 


792 


10 


026 


12 


954 


16 


897 


19 


418 


19 


436 




Hind III 


6 


5 


602 


6 


361 


9 


814 


11 


537 


15 


827 


17 


404 




Hst I 


6 


4 


004 


8 


091 


11 


427 


16 


088 


19 


690 


20 


408 


35 


Pst I 


6 


2 


244 


4 


892 


7 


003 


10 


486 


10 


533 


17 


780 




Xor II 


6 




230 


2 


659 


4 


480 


5 


694 


8 


509 


16 


962 




Bel I 


7 


19 


992 
827 


1 


364 


6 


710 


10 


564 


18 


673 


19 


403 


40 


Bgl 11 


7 


4 


197 


5 


525 ' 


7 


879 


11 


239 


13 


097 


15 


517 






15 


760 
























EcoR I 


7 


. 7 
18 


585 
911 


9 


077 


13 


445 


15 


358 


17 


059 


18 


766 
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Table 1 continued 
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Enzyme 



No. 
Sites 



Locations 



10 



15 



20 



30 



35 



10 



4 5 



Acc 


I 


8 




333 


4 


516 


6 

V 


Ddfi 

viO 


c 

.0 




9 


514 


12 656 








15 


822 


19 


089 












Bel 


I 


8 




497 


3 


568 




4RR 


Q 


Cii 


ft 

y 


O Oft 


9 916 








XL 


UU1 


1/ 














Sph 


I 


8 




582 


IT 
i i. 


476 


1 5 


nn 


ID 


US/ 


15 


486 


17 175 












?n 

cu 


ana 

HUH 












Xmm 


I 


8 


1 


759 


2 


725 


4 


498 


4 


546 


10 


103 


12 20G 








17 


338 


17 


917 














EcoR 


V 


9 


5 


134 


6 


738 


7 


775 


10 


098 


10 


626 


13 173 








14 


048 


16 


080 


17 


491 










Sst 


I 


9 


1 


967 


4 


152 


10 


879 


11 


068 


12 


395 


14 105 








17 


016 


19 


214 


19 


866 










Stu 


I 


9 


5 


590 


6 


696 


7 


512 


11 


442 


12 


066 


15 967 


Bgl 






16 


656 


20 


186 


20 


467 










I 


10 


1 


571 


3 


125 


5 


872 


5 


956 


6 


832 


9 775 


Ave 






10 


912 


14 


290 


16 


606 


21 


065 






I 


11 


3 


073 


3 


765 


5 


268 


7 


012 


9 


242 


9 861 


Aha 


III 




10 


573 


10 


629 


11 


003 


14 


402 


. 20 


700 


12 


2 


486 


11 


334 


12 


233 


13 


427 


13 


580 


13 666 


Nae 


1 


13 


15 


577 


15 


599 


16 


168 


18 


135 


18 


573 


20 070 




316 




446 


1 


664 


3 


931 


3 


962 


5 733 








7 


616 


9 


771 


15 


000 


16 


622 


18- 


474 


20 380 








20 


652 


















Pvu 


II 


13 


8 


250 


1 


235 


1 


859 


2 


395 


2 


752 


7 888 








451 


12 


042 


13 


715 


15 


590 


15 


620 


16 056 








18 


688 



















Ban II 
HgiA I 
Ban I 
Hinc II 
Xho II 
Hae II 
Nci I 
Aha II 
Ava II 
BstN I 



19 
19 
20 
21 
22 
23 
23 
24 
26 
35 



Hph I 
Rsa I 
HinF I 
Haa I 
Fok I 
Dde I 
Mbo II 
Sau 96 
Fnu II 
Bbv I 



37 
38 
41 
42 
48 
55 
63 
66 
68 
69 



Hpa II 
Cfo I 
Hinp I 
Ala I 
Sau 3a 
Hae III 
Taq I 
Fnu 4A 
Knl I 



72 
80 
80 
87 
87 
99 
113 
132 
171 
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CLAIMS 

We claim: 

5 1. A method of genetically modifying a plant cell comprising the step of 
transforming the cell to contain a pRi T-DNA promoter and a hetero- 
logous foreign structural gene, the promoter and the structural gene 
being in such position and orientation with respect to one another 
that the structural gene is expressible in a plant cell under control 
10 of the promoter. . 

2. A method according to claim 1, wherein the pRi T-DNA is hybridizable 

to pRiHRI T L -DNA 

3. A method according to claim 2, wherein the T-DNA promoter is from a 
gene selected froa a group consisting of genes for ORFs 1, 2, 3, 4, 

15 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17. 

4. A method according to claim 3, wherein the T-DNA gene is selected 
from a group consisting of genes for ORFs 1, 2, 3,- 6, 8, 11, 12, 13, 
14, 15, and 16. 

5. A method according to claim 4, wherein the T-DNA gene is selected 
20 from a group consisting of genes for ORFs 8, 11, 12, 13, and 15. 

6. A method according to claim 2, wherein the T-DNA gene is from pRiHRI 
T-DNA, pRiA4 T-DNA, or a T-DNA essentially identical thereto. 

7. A method according to claim 1, wherein the cell is additionally 
transformed to contain a pRi T L -DNA transcript terminator, the promo- 

25 ter, the structural gene, and the transcript terminator being in such 

position and orientation with respect to one another that transcrip- 
tional termination of the structural gene in a plant cell is under 
control of the transcript terminator. 

8. A method according to claim 1, wherein the promoter or the structural 
30 gene comprises an insertion, deletion, or subsitution of one or more 

nucleotide pairs. 

9. A method according to claim 1, wherein the structural gene changes a 
phenotype of a plant or plant cell when expressed therein. 

10. A method according to claim 9, wherein the structural gene encodes an 
35 insecticidal toxin identical to or derived from the crystal protein 

of Bacillus thurinqiensis . 



BNSDOCID: <EP__0204S90A2_L> 



0204590 



- 67 



11. A method according to claim 9, wherein the structural gene 1s hybrl- 
dlzable to a phaseolln gene. 

12. A method according to claim 9, wherein the structural gene encodes 
thaumatln or a precursor of thaumatln. 

5 13. A method according to claim 9, wherein the structural gene encodes a 
legume lectin. 

14. A method according to claim 1, comprising the step of integrating the 
promoter/structural gene combination Into a plant chromosome,' whereby 
the combination is flanked by plant DNA. 
10 15. A plant, plant cell, or plant tissue, or plant seed derived or 
descended from a genetically modified plant cell produced by the 
method of claim 14. 

16. A plant, plant cell, or plant tissue, or plant seed derived or 
descended from a genetically modified plant cell produced by the 

15 method of claim 1. 

17. A DNA molecule comprising a pRi T-DNA promoter and a heterologous 

f o re i'gfi st r ue t u r a-Ff e n*e , the promoter and the structural gene being 
in such position and orientation with respect to one another that the 
structural gene is expressible in a plant cell under control of the 
-0 promoter. 

18. A DNA according to claim 17, wherein the pRi T-DNA is hybridizable to 
pRiHRI T L -DNA. 

; 19. A DNA according to claim 18, wherein the T-DNA promoter is from a 
gene selected from a group consisting of genes for ORFs 1, 2, 3, 4, 
5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17. 

20. A DNA according to claim 19, wherein the T-DNA gene is selected from 
a group consisting of genes for ORFs 1, 2, 3, 6, 8, 11, 12, 13, 14, 
15, and 16. 

21. A DNA according to claim 20, wherein the T-DNA gene is select-ed from 
a group consisting of genes for ORFs 8, 11, 12, 13, and 15. 

22. A DNA according to claim 18, herein the T-DNA gene is from pRiHRI 
T-DNA, pRi A4 T-DNA, or a T-DNA essentially identical thereto, 

23. A DNA molecule according to claim 17, further comprising a pRi T L -DNA 
transcript terminator, the promoter, the structural gene and the 
transcript terminator being in such position and orientation with 
respect to one another that transcriptional termination of the 
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structural gene In a plant cell 1s under control of the transcript 
terminator. 

24. A DNA according to claim 17, wherein the promoter or the structural 
gene comprises an Insertion, deletion, or substitution of one or more 

5 nucleotides.. j 

25. A DNA according to claim 17, wherein the structural gtftfe changes a 
phenotype of a plant or plant cell when expressed therein. 

26. A DNA according to claim 25, wherein the structural gene encodes an 
insecticidal toxin identical to or derived from the crystal protein 

l o of Bacillus thuringiensis . 

27. A DNA according to claim 25, wherein the structural gene is hybridi- 
zable to a phased in gene. 

28. A DNA according to claim 25, wherein the structural gene encodes 
thaumatin or a precursor of thaumatin. 

15 29. A DNA according to claim 25, wherein the structural gene encodes a 
legume lectin. 

30. A DNA according to claim 17, wherein the DNA is contained within a 
bacterium. 

31. A DNA according to claim 30, wherein the bacterium is E.. coli or is 
20 of the genus Aorobacterium . 

32. A DNA according to claim 17, wherein the DNA is within a plant cell. 

33. A DNA according to claim 32, wherein the plant cell is within a 
plant, a plant tissue, or a plant seed. 

34. A DNA according to claim 17, wherein the promoter/structural gene 
25 combination is flanked by plant DNA. 

35. A DNA according to claim 34, wherein the DNA is within a plant cell, 
a plant tissue, a plant, or a plant seed. 

36. A DNA according to claim 17, wherein the DNA is within a plant cell, 
a plant 'tissue, a plant, or a plant seed. 

30 37. A DNA molecule comprising a heterologous foreign structural gene and 
a pRi T L -DNA transcript terminator, the structural gene and the tran- 
script terminator being in such position and orientation with respect 
to one another that transcriptional termination of the structural 
gene in a plant cell is undor control of the transcript terminator. 
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38. A DNA according to claim 37, 
derived from a gene selected 
ORFs 1, 2, 3, 6, 8, 11, 12, 

39. A DNA according to claim 38, 
a group consisting of genes 

40. A DNA according to claim 37, 
a plant tissue, a plant, or 
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wherein the transcript terminator is 
from a group consisting of genes for 
13, 14, 15, and 16. 

wherein the T-DNA gene is selected from 
for ORFs 8, 11, 12, 13, and 15. 

wherein the DNA is within a plant cell, 
a plant seed. 
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G6CCGCAGGATTTCGTTCGTCGTGCGTGATGAGATCGATAAAT6TTTATCGACGAGGACA 60 

AGATCGACGATGCGGTTCTTGCGCTGTTGTAGTGACGCTCCACAACGAGTGTTGCGCCGT 120 

GAAAGGCTTTGACTGGGCCGCGACGGACCGCCTTTGCAGGAAGGGTTCGGTCGGCGATCC 180 

CGTCAATAAATCGAAGCTATTGATCCTGACGGATAAAGGTCTGCGTCGATCGGAGGAGCT 240 

ATTCCGACAGCTGTTTACGCGCTAGCCATTGGCCGACGGTCTTTGCGCCCTCCATTCCCA 300 

CGGCGTAGTTAATGCCGGCGGGGACGGGAGTGTCTACTATGTGCAAGCACGTCGGCGAAC 360 

CATGCCTTCGGATTAATGTCGTTCAGACGGGCGGTCGTAAGTTGAATGAG7ATGACTGCC 420 

GCATGGTCAGCGCCGCGTTGGGAGCCGGCAGATGTCCAGTCGCGGCGCCTCAAGGCCATC 480 

ACATGTTCACTCTGTGGCCAGAAGGCGTCGCTCCTTGGGTGGCAGGATATATTGTGATGT 540 

AAACAGATTAGATATGGACATGCGAAGTCGTTTTAACGCATGCTTTATCGAATATAAAAT 600 

GTAGATGGGCTAATGTGGTTTTACGTCATGTGAATAAAAGTTCAGCATTCGTTTAATAAT 660 

ATTTCAATATCGGTGTCTAGAGACCCGTGGATTTGTATAGTCAGCACCATGATATGAATC 720 

TATAAAATATTGTATCTCCAATTGCAATTCAATCGATAT^AGAAATTAATACAAGCCGTT 780 

CATATAGTAAGGTTGCCAATGGCATTCAATAACGACCGTACAGTTGCCGCTATATTAATC 840 

tACGTGCCATTTCTTMATAMGATAGGCGAATGACTATCGAAA^TA'AAACAATTATTAA 900 

TGAGTGAAAACGTATTGCACAAATAAAGATTCATTATGGTTGGCTCAAATTTTGGCTCTG 960 

GTGCTCGATGACGTCGAGATGAGGACAGTAGTGATCAACTTGGCGGTCGATACCTTGGTT 1020 

ACGCCACTCCCAGAGTGCCATGTCGTCCTCCGAGCGGTCTGAGATAACCCAGTCGGCAAT 1080 

TGCTGCTGCATTGCCGGGCGTTCCCCAACCACGACGAATATGCTTTCGTTCATCTAACTC 1140 

GCGTCGCACTGCCCTCCCAGTCATGAAGTCAAAGCCAAATTCTACCCTCTCTCCATTTCC 1200 

CAGCTCAGTCGAGAAATCGTAACACCTCGTGGCAGCTGACAGTTTCAGAAAGGGGCGTAT 1260 

CCCTCGAACTCCAGGGTCCTCTTTCACATAGTTAGCAAGGCGTACTGCTGCATAATCTGC 1320 

GTTGAAGGCTCTGATGACTACAGGATCCTCGGACAAGCCCAATTGATCAGGGCGAACCCT 1380 

CGCGCTCATAATATGAATTGCGACGACCCTTGCTTCCTGTCGGAGCATCGAATCAATCCA 1440 

AGCCTTCCCTGCGGCATAGAGGTCATCGACTGCGATGTCATCAAGATCGAGTAGCTTTGC 1500 

CAACCTAGGAAGTTCUGAGGAAAAATCACCGGCATGACAGCAACCGTCTCTCGCCAGTC 1560 
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AGTT6CCGGACTGGCTTCCCTAACGCCATCCACGAATGCCTCACCGCTTGCGTATTTGAA 1620 
TGTGTAAAAGAGAAGGACCACTCTTTGGCGGTACTTCGGACGCCGGCTTAGCCACGCGGC 1680 
AATAATGTGGGCCTCAAACTCACGACCATCCAAAAATATAGTCGCGCCTGGATTGACCTC 
GCTGGCCTTGTCGAGAAGAGGTTCCAAAAAGGGAACGGTGTCTTTCGTAATAGTACTTAA 
ATCTGTGAGTTCGCCATGCGAAACCTCTCGAACGATTATCGGCGTATCCCTGACATCAGC 
TGAATGAAATTCTCGGACGAGTTTGTCGGGCAAAGTGGAGACCCGCCACGTGTTGAAGTC 1920 
GTGGGAAACGATGGGCACATCGTCGCCGGTGAGTGCGGCATCGAGCTCAGAGAGGTTCCG 1980 
CCTGCCAACCTCACCGAGAGCAGCTAACAACGAAGTTTCGGTGCATTCCTGTATCCCTTT 2040 
ACCCAGATTATACATGCCCCGGTGTTCGATAACTTGAAGAGGCAGTGGCTCCTCAAGATG 2100 
TTCAAGGAGGTGGGGTACAGAGTGCCGGGCGAGGACCTCATCCACCGTGACACCAACCGG 2160 
GAGATCCCATTCGAGTTTCCACTGGGGCCAGCATGTGCCCGCGACGGCGAAAGGTTTGCG 2220 
CTGGCAAAGAACCCGGCTGCTGCAGGTGGACCTATCCTTACCCATGGCAATGGGGTTTTG 2280 
CTAAAAAGTCAGGCACTTTACTGGGCAATTGATAGGGTGGGATTGCGTTATTAACTGTTC 
TCCAGCGGGAATCTTTATCTTTATTGAAATGCTAAAGCACTTAGATAAAATACAGCTGTA 
CCGCAATATAAAATAGTAGGATAATGTAATATGTGTATCGAGAATACGACAAGCTAATAT 
AATCTAGCGTCAAATTGCAATAATTTAAATCAAAACTACTGATGAAATAATAAAAGATGG 2520 
TCAATTTTTATTGGTAGGAGTTGTCGAAAGATTCGACGGACGGCCATTACAATACATAGG 2580 
TGCAAGAAGTAAAACAGGAAGGGAAACGGAAAACAGTGCTATAAAAAAGCGACAGATCGC 
GGCGATCACTGACTGCGATCGGGAAGAAGCTCGCCAAGTTCACCGAGAATAGCAGAGAGC 
GCATCCTCATCGGGTACTACGAACACATTCGTCCCAGAGGGCTTTGTTTCAGCTGCGCCA 2760 
ACCCAGAAAGCAAGGCCATTTTCCAAGTTGCCGATGGCGGTCAGCATGTTTTGATTGTTG 2820 
CTGCCGTTTCCACAAGCGATGTGAAGGCCGATCCCGTGAGAGAGGCCCTTGACGAAGGTG* 2880 
AAATAGCCT7TGGATTTTCCAACTGTTTCAACGGGCACTAGATATTGACCCTCTGGCGCG 2940 
GCAACCACCTTGAATTTGCGAGATGACTGGTTGCCGATGAGCGAAGAAAGCATTTCTCCG 3000 
GCTTCTTTGTAAGATTTGTGAGATTCCCACATTTGACAGCCGTAGAAATGCCCCATCGGA 
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2460 



2640 
2700 



3060 



ATGTTGCGGATTCCCGGGATGCCACCAAAnTGTTCTCCATAGCCGCGTGAACGGClTGC 3120 
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CAGTTGGGCAGGGAGAAAGAATC6AAGCGATCATCTTTGTAGATCGTGACCATTCCATCA 3180 
TtTCCCTGGAATCCGATATTTTCAATGGCGCTGAAAACTGACCTTGCGATTTCTTCGCAT 3240 
TCCCGTGCGGATGTGAGCAATTGATAATGGCCCTTGCAGGCGATCCTGGTCAAATTGGCG 3300 

ATGATGTTGATGGCAGGATTAATATCCCAACACTGGTGATTTCGATCTTGCTTAAAGGTG 3360 

GTACCATCGCCGTCGAAGGCGAGCAGGGCCCGGAGAGATGAATCGGCAAGACTGCGTCGG 3420 

ACCCGCTCCGCGGCGTCGGGAATGAGGCTGATAAGAGACATATCCAAAGGTGTTTGTGGG 3480~ 

TAACGGGCTGCTCAATGAAGCCTTAAATGCAACGCAACATATGTAAGGATGAGTTGACTT 3540 

ATTGGAGAGAGAAATAGGAATGAGCTGGCCAGCCATTATCAACGTGGGGCCATGCTGACA 3600 

ATGTTTACGTGAAAGGCTCAACTACCTCGAAGCAGACCTCTATATTCGTTGACTTTATTA 3660 

CTGAACAAGAAGTTGCTTGCCACTCATTTTCTTAAATCTTGCCCTTTCTGCGCCTCGCTA 3720 

TCATGCCCGCCAACGACGCGACATGCGCTGCCGCGATTGCCTTCCCCGAGGGCAACTGGA 3780 

AGGAAGAACTTGATGCGCTCCGCACCTTGTGTGACCCCGTCGAGGTGGTTAAGGTCGCAG 3840 

TCGGCAGAGGTCTTAGCGGCATATGTAATGTTGTTGCAGCAATGAATCCCACAAAGGTGA 3900 

GGGGCCTGGGCGATGTCATCGGGC'AGATGCeGGCTCTTAATCACCGTAT^GCT 3960 

CCGGCGAAACTCCGGTGCGAGACCTTGGAATAGGTTACCAGTGCGCAATCTGCCACCCCG 4020 

ACATAGCCAGTGCGATGTTAGCCACTTCTGAGGGGATCAGCCACGTTCTCCGTGAAAGGA 4080 

TTGAGAAAGAAGTTGACCGGGACATTGGAGAAGGCGCCACCGTCTGCATTTTCGTTCAGC 4140 

CGAGAATGAGCTCCAAGGGCTCTCCAGTTTCTGTCCATTTCACCCTCCAGTTTGCGAGAT 4200 

CTGGAACTCTTGTCGATGCCAGAATGATGGAGAGTTACAATTTCATGAAAGGCAATGGCA 4260 

CAGTGACCGCACCGGATTTGAAAAGTCATTGGAAGAAGCACGGTATTGACAGGCCAGGCC 4320 

CACGTCCGCCCACGTCCAAGTTTGAACTCCTCTTCGCCGCTGTCCCCGACAACAGTAAAC 4380 

TTGCCGCCACCGATTTTACCCATCTCGGCCCTGTCGAGCGTGATAAGGAACTACTCGGCA 4440 

GCACGGTATTCGGGATTGCCGCTAAGAAACCTGGTACGATCGTTTATCCGTGCGAAAAGG 4500 

TTCTCTGTTTGGAGGTCGACGTACACGCGCATCGCGCCCTAGAAGTACTTCACCGCCTTG 4560 

GGGAACAGGCTTATAGCAATGGCCGTGGCACTAGCTTCGGTCTTCACACCGGTCCGTCCT 4620 

CTTGCCTTAATCTTTCCGCCGCCGCGCTCGCTACATTTTTCAAACGCTCGGA1 CTCTGTT 4680 
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CCCTTCCATTGAGTGATGCTTTTGTCCTTTTCTGCGACCCGCCACCGCCTACAGCGCCAA 4740 

GAAAGATGGCCTTCCGATCACTGCCTTCTCCCCCACGAGCACCAATCAGTTCGAACTCGT 4800 

AGAGCCTCAGGTCGTCAAGGCATATGTTCTCGGACTTTTCGACGCGCCGACGATGGTTAC 4860 

GCCCCGCGACAAAACGCGAGCCAGCTTCTGCAGCCAATATGTACGTTTCCGTGAACCGCA 4920 

TCCCTGTGAAGAGTTCAATGAAATTGGAGTTTTGATCCTCGATGCTGCTGCTAAAATGCT 4980 

CGAACGTTATGCAAAATTTCTAGAAGATGGTGGAAGAGATGATGATGAAATGGCGAACAT 5040 

AATAGATGTATTTGGGTTTTGTCTTAACTAGTGGATTGATTGAAACAAAGGAGTCCGAGT 5100 

TGGGATTCCCTTTCGGTCTTCGTCGTGCAACGATATCGTATGCGTACAGGTATCACATTT 5160 

AACGTTGCTGCGGCGGACCGAGCCCGCTTGGAAGCGATTGTTGCAGCTCCAACTTCTGCT 5220 

CAGAAGCACGTGTGGCGAGCGAAGATCATCTTGATGAGCAGTGATGGCTCGGGAACGGTC 5280 

GCGATCATGGAGGCAACCGGTAAATCCAAAACCTGTGTCTGGCGCTGGCAGGAGCGCTTC 5340 

ATGACTGAGGGCGTCGATGGCCTTTTGCACGACAAGAGCAGACCGCCCGGCATTGCGCCG 5400 

CTTGATGGCGAACTCGTTGAGCGTGTCGTCGCACTGACGCTTGAGACGCCTCAACAGGAA 5460 

GCAACGCACTGGACTGTTCGTGCGATGGCCAAGGCCGTTGGGATTGCAGCCTCTTCGGTT 5520 

GTGAAGATCTGGCACGAGCATGGTCTTGCGCCGCATCGCTGGCGCTCTTTCAAACTGTCG 5580 

AACGACAAGGCCTTTGCCGAGAAGCTTCACGACGTCGTTGGCCTCTACGTCTCGCCACCG 5640 

GCCCATGCCATT6TCCTGTCCGTCGATGAGAAGAGCCAGATCCAGGCACTCGATCGGACG 5700 

CAACCGGGACTCCCCTTGAAGAAAGGGCGCGCCGGCACAATGACCCACGATTACAAGCGC 5760 

CACGGCACCACCACCCTATTTGCCGCCCTCAACATCCTCGACGGCTCGGTGATCGGCCGA 5820 
AACATGCAGCGTCACCGGCATCAGGAGTTCATCCGTTTTCTCAACGCCATCGAGGCGGAA 
CTGCCAAAGGACAAGGCCGTCCACGTCATTCTCGACAATTACGCGACCCATAAGCAGCCG 

AAGGTCCGCGCCTGGCTGGCAAGGCATCCGCGCTGGACCTTCCACTTCGTCCCAACATCA 6000 

TGTTCATGGCTGAACGCCGTCGAGGGATTCTTCGCTAAATTGACACGTCGACGTCTGAAG 6060 

CACGGTGTCTTTCATTCCGTCGTTGACCTCCAGGCCACCATCAACCGCT7CGTCAGAGAG 6120 

CATAATCAGGAACCAAAGCCGTTCATCTGGAGAGCAGATCCAGACGAGATCATTGCAGCC 6180 

G1CAAACGTGGGCACCAAGCGT7GGAATCAATCCACTAGCGTATGAACAGTAATAAGAAA 6240 
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ATCCC6ATTGTGAATAGTCCCAATTTCAAATGTGTCCGTGTGTAATTTGCGTGTCTTCAG 6300 

TTGAATTTCCTTTAATAATATCAAATATTCAATTGTGAAAAGTTGTATTGGTTCAGGTTC 6360 

AAGCTTTCCGAATTTGTTGAATTTTATTCCCTGTTTTCAATTTGTTGACTTGTTTGGGAG 6420 

ACACCTTTTTTGTGTTTCGTGAACATGTCACCCCTTCGGTATACATTAGCCTACAAAGTA 6480 

AATAACGTTGATAAATGTCACTCATGTTGTAATAAAATTGAGCTTATTATGTATAACCAG 6540 

ACCCTGTGTTAATCTAATTACAAAGAAATTCATCATTCTCCCAAGCAATCCTGAGTAGCT 6600 

GCGTGATGGATCTTCCATATCAGCGCCCACGTTTCACCCCGTTTGCCGTCACCCATCCAC 6660 

GTAGTGGAGTCAACCTGAACCGTGCAATTTCTCAGGCCTTTGTCTGCTATGATCAGTTCT 6720 

GCGAACGGCTCTTGCGATATCAGCAAAGCTGGACGGATTGGGTGTTCGACCACGGATTTG 6780 

CAGAAGCCATTGAAGACGTGGCGCTGGTGTTCCA6GTTGCACCTTGCCTTCATGGCCCCC 6840 

GAATAGGCGCGCTCGAAGTGTTGATACCTCGTCGCACCCAGGTCTTCATTTATATGTCGA 6900 

ACAACCAATTGCAGCGCTTTGTTGCACACCAGTGCATTGCTCAACTTGGCGACGCCGTGC 6960 

TTGCTTGCATGATCCCGCCCTACGCGAGTGACCTCTCGCTGCAGGAAATGGCTCGGGCGC 7020 

ACAACAGATTTTGCCCAGGCAGTTACACGAGGTCCGCAGACGTACAGTGCTTTATCGCCA 7080 

TCCAACTCAGCAGCCGATTCGTTGAGGAGGGCACATGTAACGTGCACGGGCGAAATGGCT 7140 

T AAAAAGAACCTGCCGCTTCTTTCGTCGCCCTGCTGAGTTCTTCAGCCGTTATGACATCG 7 200 

TTGCCATTGGGCCGGTGCTCTTCCATGATGAACTGGATTGCCCAGCAAACTGCAATGAGC 7260 

CTCTTTCCTGCTTTGACCTGCGGTACGACTATCAGGTTTTCCTCCAGGAGTGCGATGCCC 7320 

ATGATGGTGTGGGGCATTATCCGGAAGGCGCACCACTACCTAGTGTTGCCATCGTAGGAG 7380 

GCGGGCTGTCTGGCCTTGTTGCTGCCACAGAACTACTTGGCGCTGGCGTCAAGGAAATCA 7440 

CTCTTTTCGATACCGTTGATGAGATCCGTAGTTTTGGGGCATCGCCGATGCCAAACGGCG 7500 

ACGCTCACCAGGCCTTGACGTCGTTCGGTGTCATGCCTTTCTCCGCCAACCAACTTTGCC 7560 

TGTCATACTATCTGGATAAGTTTAGAATTCCGTCCAGCCTTCGTTTTCCTTGTGCCGGCA 7620 

ACGACCACACAGCACTATATTTCCGCCAGAAACGCTACGCATGGCACGCGGGGCAAGCTC 7680 

CGCCGGGGATATTTCAGCGGGTACATGTCGGATGGAAGACACTACTCTACCAAGGGTGTG 7740 

AACGGAATGGCAGGAGACTGATGGCTCCGATGGATA7CTCTTTCATGTTGAAAGAGCG1C 7800 
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GTCGTGATGAAGCCTCAGAAGCACGGCAGCTTT6GCTCCGAGAGTTCGGAAAATTCACTT 7860 

TCCATGCCGTTTTGGTCGAGATCTTCAGCTGTGGTAATTCGAGTCCTGGTGGCAAGGCAT 7920 

GGCAAACACCCCATGATTTCGAGGCTTTCGGGATACTGAGGTTGGGATACGGCCGAGTTT 7980 

CGTCCTATTACAACGTGTTGTITTCAACGATCCTGGACTGGATTATCAATGGCTACGAGG 8040 

AGGACCAGCATCTTTCTATTGGTGGGGTTCAACTTTTGCAGGCTCTGATGCGCATTGAAA 8100 

TATTCCAGAAAAGCCATGCGAAAGCACGACTCTGTTTTGATCCCGTGCGTGGAATAGCCA 8160 

AGGAGGGCGGGAGATT6AAGGTATGCTTGAAACACGGTCATTCGCGTGTTTTTGACCAGG 8220 

TCATCATTGGCGGCAGTGCTGAGGCCGCTACAGTTGATAACAGACTGGCCGGGGATGAGA 8280 

CTTCCTTCAGCTACAATATCGAACCCGCCGTCGGAAACTCGTCTGCCGCTGTCAATTCAG 8340 

CACTCTTCATGGTCACGAAGCAAAAGTTTTGGGTTAACTCCGGCATCCCAGCAGTGATAT 8400 

GGACCGATGGGCTTGTCCGTGAGCTGTGTTGCATTGACATCGAATCGCCAGCTGGAGAGG 8460 

GCCTTGTCGTTTTTCACTATGCTTTGGATGACTATCTATCCCGGCCGATCGAGCATCATG 8520 

ACAAGAAGGGACGGTGCTTGGAATTGGTCAGGGAGCTTGCTGCTGCCTTTCCTGAACTGG 8580 

CTTGTCACCTGGTCCCAGTCAACGAAGACTACGAACGATATGTCTTCGACGACCACCTAA 8640 

CGGATGGTTTTAAGGGAGCTTTGTGGAGGGAAAATTCTCTGGAAAAAGGTCAGTATATCC 8700 

AGGATCTGCCTGGGAATAATTTTCCTATTGGGGATCACGGGGGAGCCTATCTGATTGACC 8760 

GTGACGACTGCGTCACCGGAGCCTCGTTCGAGGAGCAGGTGAAGGCGGGCATCAAAGCGG 8820 

CCTGCGCCGTCATCCGCAGCACCGGCGGGACGCTCTCTTCACTCCAACCGGTGGACTGGA 8880 

ATAAAAAATAGAAATTTCCTGATTAAGTTATAGTCAATGTACTATTGCGTGTTAATCCCG 8940 

TAGGTATGCAAGCTGCACCGGCAGCATCATAATTTGATGTTCCATCAATAAATTAAGGTG 9000 

CCCGTTCATTGTGTATTACATTATGTATGTTTATCAAAAATATAATCGAAGTCCATTTTA 9060 

AGTCTGATATTAATTGGAATTCCAAACGATTCCTTGATGCCTATCTTCGCTATGATTGTA 9120 

TGGTAATAAAGTCTCCACATCTCCCGAAAAATGCTTTCGTGATTTACTTGTCTCTCACGT 9180 

GCTTTCGCATCTTGACAGCCAAAAGTGGGCAACTTGAGAAGAGTATTAACTGGCCACGCA 9240 

ACTCGAtjATATTCCCACTAACCCCAATGACGTCATTGCACTCGTCACGGGTAGCAGCCCC 9300 

ACTTGCCTTTGCCACTTTATTAATTCT7TGGCCCACTGGCCATTAATTGGCACC1ACATA 9360 
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TATTAGTGGAGAAGATAAAGTGTCACTATCGTTTCCTGTTCAATTTTGAATTTTGCAAGG 9420 

ATTTCATGTTGTCAACTACACAGCTTGAAAGGAAATCCGCAATCAACGGAGAAACGTCAA 9480 

CATCTCGACAAAAAAAGAATGCTTCATCATTGCGTAGACTGCATATTGACCGCTCCTTTC 9540 

GGCGCTGGGCCTGCTTTTACTGTTGCCTAGCGTTCGGACAGCCACCAGAGAATGGGCTAT 9600 

ATAGATCCTTTCATCAAACCAAAACATTACTAAGATCATGCTGTAACGCTTCAATACGGT 9660 

GAGTGTGGTTGTAGGTTCAATTATTACTATTTTTGAAGCTGTGTATTTCCCTTTTTCTAA 9720 

TATGCACCTATTTCATGTTTCAGAATGGAATTAGCCGGACTAAACGTCGCCGGCATGGCC 9780 

CAGACCTTCGGAGTATTATCGCTCGTCTGTTCTAAGCTTGTTAGGCGTGCAAAGGCCAAG 9840 

AGGAAGGCCAAACGGGTATCCCCGGGCGAACGCGACCATCTTGCTGAGCCAGCCAATCTG 9900 

AGCACCACTCCTTTGGCCATGACTTCCCAAGCCCGACCGGGACGTTCAACGACCCGCGAG 9960 

TTGCTGCGAAGGGACCCTTTGTCGCCGGACGTGAAAATTCAGACCTACGGGAITAATACG 10020 

CATTTCGAAACAAACCTACGGGATTAATACGCACGTGGCTGGCGGTCTTCGATTCATTTC 10080 

CACGCCGGAGATGATATCGAATATGTTCTGTTAAGTTAAAATAAGCTGCGAGCCATGGCG 10140 

CGATTGTCCTGTTTTATTAATATAGTACTTTAACGTCTCTTTAGAGCGTTTGTGTAATGT 10200 

CGTGAAAATGTTTTATGTCAAATGTACTGTTGAACTATAATATTATAAGTCCAGGTGTGT 10260 

CGTTGTTGTTGATACTGCAATATATGTGTAGTAGATTAGATAGTCATATGAGCATGTGCT 10320 

* • • • . 

GTTTTTGGCAAAATTCAGCAGCAGGATCAACACAGAAGAAAATATTTAGTACAAGAAAAT 10380 

AGGTCAACACATTACAACGTACGCTACAACTCCCAAGGTTCTGTGTCACAGACTGCGGGA 10440 

GGGTACATAGAACTTATGACAAACTCATAGATAAAGGTTGCCTGCAGGGGGAGTTCAAGT 10500 

CGGCTTTAGGCTTCTTTCTTCAGGTTTACTGCAGCAGGCTTCATGACGCCCTCCTCGCCT 10560 

TCCTGATCAGGCCCCGAGAGTCGCAGGGTTAGGTCTGGCTCCGGTGAGGAGGCGGCCGGA 10620 

CGTGATATCCCGAGGGCATTTTTGGTGAATTGTGTGGTGCCGCAAGCTACAACATCATAG 10680 

GGGCGGTTTTCAGTCCCTCGCCGCAGAAAGAAGGTGCAAGCTACCTCTCTCCCGTAAACG 10740 

TTGGTCACTTTTAACTCCAGCAAGTGAATGAACAAGGAACTTGCGAAAATGGCGATGAAG 10800 

CATTC7AAATCAGGTTCCTCCGTGCGGC7GTGCGGCCAAGCAAGGTTGTGAACACGGAGC 10860 

ATCTCCTGGAGGGCGAGCTCGCTCCGATATGGTTGAATCGTTGTCGCCAGCACGGCCTCC 10920 
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ATTCCAAATGTAATGGATTGTTCCTTCAGCACTTTCTGCATCTTCTCGCGAGAAAGATAG 10980 

ACAAATACATGTTGGTCGTTTTCTCGAGCCAGATCCGGCTGACTAACAAACATAGGAGGA 11040 

TGATAGCAGACTTtGTTCTTCAAGAGCTCAGCTAGTTGTTTAAGTATATATATCGGTGGA 11100 

GAGTTTTCCTTCAAATCTAGCACTGCAAGAGCCCATCGTTTCTGGAAATGCAGGAGGGGT 11160 

TTGCTATAGTCACGGCTATAGATTGCAAAAGCAAATCGGATCCCCTCGAATAGGTTTATC 11220 

TGGCTCCATGCTGGAGTGAGATCTACTGGTTGAAATCGTGGAAGGAATAGCAATTTGGGA 11280 

TCCATTGTGATGTGAGTTGGATAGTTACGAAAAAGGCAAGTGCCAGGGCCATTTAAAATA 11340 

CGGCGTCGGAAACTGGCGCCAATCAGACACAGTCTCTGGTCGGGAAAGCCAGAGGTAGTT 11400 

TGGCAACAATCACATCAAGATCGATGCGCAAGACACGGGAGGCCTTAAAATCTGGATCAA 11460 

GCGAAAATACTGCATGCGTGATCGTTCATGGGTTCATAGTACTGGGTTTGCTTTTTCTTG 11520 

TCGTGTTGTTTGGCCTTAGCGAAAGGATGTCAAAAAAGGA7GCCCATAATTGGGAGGAGT 11580 

GGGGTAAAGCTTAAAGTTGGCCCGCTATTGGATTTCGCGAAAGCGGCATTGGCAAACGTG 11640 

AAGATTGCTGCATTCAAGATACTTTTTCTATTTTCTGGTTAAGATGTAAAGTATTGCCAC 11700 

AATCATATTAATTACTAACATTGTATATGTAATATAGTGCGGAAATTATCTATGCCAAAA 1 1760 

TGATGTATTAATAATAGCAATAATAATATGTGTTAATCTTTTTCAATCGGGAATACGTTT 1 1820 

AAGCGATTATCGTGTTGAATAAATTATTCCAAAAGGAAATACATGGTTTTGGAGAACCTG 1 1880 

CTATAGATATATGCCAAATTTACACTAGTTTAGTGGGTGCAAAACTATTATCTCTGTTTC 11940 

TGAGTTTAATAAAAAATAAATAAGCAGGGCGAATAGCAGTTAGCCTAAGAAGGAATGGTG 12000 

GCCATGTACGTGCTTTTAAGAGACCCTATAATAAATTGCCAGCTGTGTTGCTTTGGTGCC 12060 

GACAGGCCTAACGTGGGGTTTAGCTTGACAAAGTAGCGCCTTTCCGCAGCATAAATAAAG 12120 

GTAGGCGGGTGCGTCCCATTATTAAAGGAAAAAGCAAAAGCTGAGATTCCATAGACCACA 12180 

AACCACCATTATTGGAGGACAGAACCTATTCCCTCACGTGGGTCGCTAGCTTTAAACCTA 12240 

ATAAGTAAAAACAATTAAAAGCAGGCAGGTGTCCCTTCTATATTCGCACAACGAGGCGAC 12300 

GTGGAGCATCGACAGCCGCATCCATTAATTAATAAATTTGTGGACCTATACCTAACTCAA 12360 

ATATTTTTATTATTTGCTCCAATACGCTAAGAGCTCTGGATTATAAATAGTTTGGATGCT 12420 

TCGAGTTATGGG7ACAAGCAACCTG1TTCCTACTTTGTTAACATGGCTGAAGACGACCTG 12480 
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TGTTCTCTCTTTTTCAAGCTCAAAGTGGAGGATGTGACAAGCAGCGATGAGCTAGCTAGA 12540 

CACATGAAGAACGCCTCAAATGAGCGTAAACCCTTGATCGAGCCGGGTGAGAATCAATCG 12600 

ATGGATATTGACGAAGAAGGAGGGTCGGTGGGCCACGGGCTGCTGTACCTCTACGTCGAC 12660 

TGCCCGACGATGATGCTCTGCTTCTATGGAGGGTCCTTGCCTTACAATTGGATGCAAGGC 12720 

GCACTCCTCACCAACCTTCCCCCGTACCAGCATGATGTGACTCTCGATGAGGTCAA7AGA 12780 

GGGCTCAGGCAAGCATCAGGTTTTTTCGGTTACGCGGATCCTATGCGGAGCGCCTACTTC 12840 

GCTGCATTTTCTTTCCCTGGGCGTGTCATCAAGCTGAATGAGCAGATGGAGCTAACTTCG 12900 

ACAAAGGGAAAGTGTCTGACATTCGACCTCTATGCCAGCACCCAGCTTAGGTTCGAACCT 12960 

GGTGAGTTGGTGAGGCATGGCGAGTGCAAGTTTGCAATCGGCTAATGGTTAGTCGATGGG 13020 

CTGACGAGTTTGATGTCAGGAGAAGCTGAGTGTGTCACTTGTTTCCCTTTAAGAAGTATT 13080 

AATGTAATAAAAATCAAGATCTGGTTTAATAACTGGATACTTGATTTCATCGCGCTTTTT 13140 

TTGAATAAATGTTTGTTGTCTTGACTTTAAGATATCCTTTGAAATTTGCGTTATTCGTAT 13200 

TTCGCTTTTGGTTATTTCCAAAAGACTTTGCTCAGTAAGATCAAACGTTTGTATTTCTCC 13260 

GGGGCACMTATTTGAGCTATvATGGAGTGGGGG'AGGGGCGGG-AATAGATGAAMTTGCCA^ 13320 

AAATTAGCTATCGGTCTTCTGAAAAGAAGGGCCGACATGTTTTCATAGACCATGCAAAGT 13380 

CATACTACCTGAAACTGATAAATAACGACAAAGAAAGTAGCCTATTTAAAAGTCGCTATA 13440 

GCATGAATTCAACACAAGGAAACCAAAAGTCGGAAGGAAGACTTTAATCCCGGATTATTT 13500 

GGACATGATAGGAGCTATGGGGCAACGTGTCATTTTCATGAGTGTTGAATGATTTTCTGT 13560 

AGCAAATAGAAAACGTTTTTTAAAACGATGTGGCCTTGGAGTAATCAGCGGAAGAAATGG 13620 

TCATGCTCAGATAATTTCCGTTGCTGACCTCGCAACCAACCCCTTTAAATACCTCTGCTG 13680 

CCCATGCATTTTGCCAAGTTAACCTAAAGTGGCAGCTGAATGGCTCGTTATTGCAGTGGT 13740 

GGCTCTCAACGGCTTCATGTCGATGATTTTCGTTGGATCAAGGAGCCCACTCGACTGAAG 13800 

GCTCAGCTTATTAATGTGGTGGAGACCTACAAGGCTGCACAAACAGAGACGTTAAAGTAC 13860 

TATATATCATCTGCAACTGAGCGTGTGGCTCATGTGGAGGCAGCCGAGGTCAACAATGCG 13920 

GAAATGGAGCTGCATCCTGCTGGGT7GAAGTACCCTCTGTCCTTCGTCT7TACCTCCCTG 13980 

GCCGTGGCTACAGCCTGCAAGGAGAACAAGCATCTCTTGTGCGAGGAGCA71TGGAGGGG 14040 
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GACTTGATATCGTGCGTCGTTCCTCCCTATCAGACAAATGTCTCACTC6CTGCTTTAAGG 

GAGCTCCACAATTCCATTTCGGGAGGAGGGTACCAGGAACAAGCAGACATGGATTATTTT 

GTGGCGATCATCCCAAATGATAATTTCGACTATCAGAGCTGCGAAATCGACACACGAAGT 

TGCGGTAAAGGACTTTGCAAGATTTATAGTAGGGAACTGGGAGGGCAGCCTCTAGCTTAT 

GACGCCATACTGGCAATCGGCAAGGTGCTGCTGCTGGAATAGATAGTGGGCCGCTGATCC 

GAGTTTGATTTTGTCGTATTATGTTACGTGAACTTTTTATCATGCATGTTTCGCTTATGC 

TCCCGAGTGTCGGCCATGTTGTTGTGTTAAAATAAAAGGCTGATGTTAAGTCCTATTGTA 

AAATACCTTTATAGATTAAATATATATAGTATAACTTCTGTATGCCGTCGATGAGCGGTT 

ATATGATTGTAATCTATACGTTGTTGCAATCAATCGTATTACAGTGAGCCGTGCTTAATG 

AAATAAACATCATGTTAAATGTCTATTTATTCAATCAACATGCGCTGACAATAATCAAAA 

GGGGAAACGTAATAACATTGCGGTGGATACAGCGTTTATTGGGAGGTCCGCGGGCCGATA 

CACTTAAATAACATAGACAGAATTTGAGAGAGCACGCAGGTTGTAGCCAAGTTGAGCGAC 

TTGCCGGTAGCACGGAAGCTAAGCTCAGGTGTTACAAATAGACAGGCGTCGAGGCGACGA 

GCACGACGACCTTGCCGGACATTGCGGTCGCAGGGGGCTCAAAGCGGTTGGCTTGTAACG 

GACCTTGTGTTTCTTGTTGTAGCTTTCATCGAGCATAACCATTGGGACGGTTGCTGAACA 

ACGGTAACGCACTTTTTTCACGGGAGCGAGGTAGAAGAACATATTTCCCCGTCGGCAGCC 

GGCGGTGAGCATGCCAATTCCTAAGGGATCAATGGACTCGTGCGAACGGTGAGCATGCCG 

TTCTGACCGTCGGTGCCCAATCAGCAGGCCACTCCCAACATGTTTTCCAAGTCCTTAAAA 

CCAGTCTTTATAGCATTGATCTCCCAGCAATCTTTATTGAAGTCGATTTTAATATTCAAA 

AGAAGATTTTAGTGGAAAGGGAATATAATCGCGTGGCCGAAGAAGAGCCTTCAAAAATCA 

GAATCCACTAGGATAAACAATAATATCTGAAAAGCATTGAATTTGGGTTAGGCACGAGAG 

GCTGACGCGGATGCCACTCGATTGCTAGTGGAAGGATTCCCTTTTTTCTAGCGTATCGAA 

TTCACCGTTTCACTATATGTTTTCCTGATTGGTTGATCTGCGGGACCACCATTGACTGCC 

ACTAATATCGAAAGTGGGTCTGCTTCGATTATGATGCTTTGTGAGAGGTTCTCTTCCCAA 

TGCATGCAAGCTGGCAGATTCGGATACTCTCAATAGAGATCTTATTTCGCGTCTCAAAAA 

G7TCCCAGAAATCAACAAAGGGGAGGGCAGGTCCTTTAAATACG1TGCAGC7GTCCTTTA 
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AAATAGAAGAGAATTTACAGCTGGAGGCACAGACCACTAAACTGCGAAAGTAAGCATGGC 15660 

AGATGAGTTGGAGCGTCAATTGGAAGCCATTTCTCTCATTACAGTCCTGGGTCCGGATGT 15720 

GAAGGCTGAGCTTGAGGCGGAGCTACGAGACTACTGCGAAGATCTCGACTTCTGGAAAAG 15780 

CCACGGTTTACCGGTGGCGGATCTCGATCAGACTGTGACTGTCGACAAGCTTCTATACAT 15840 

GTATATGGATCGGGCAACAGCAGACCTGTGTGTGAAGAATCGCTGCCTCGTTTGCAACAG 15900 

TGGCAATTCAGCCGCAAAAGTAACCTCGCTTCCACCATACCTTGCAGGCGTGACAAGCGC 15960 

CGAGGCCTATGAGAAACTCAACTCCATTGTTGATGGGAGTGTCGCCCCCCAATCTCGTGG 16020 

GCCTCCCTGCTATTTTGTGGCGTTCCTGCCCAGCAGCTGTTTCGAGAAAACCAGTGAGAT 16080 

ATCGGTGCGCACAGTGGACGGCGAGTGTGGCCCCTTCGATGTCTTTACCCGGCAGCGTCA 16140 

GCCACAGGATCAGAGTGATATGTTTTTTAAATATGAAGGAGTTGTATGTGCTGGAAAGAG 16200 

TGTATTTATGTAAGAATTATCTTTTATAGCCTGTGTTAC6TTTGAACCCGGTCCGCGCGG 16260 

TATTGTTTTCAATAAATGGTATGTGCGGAGGATATAATTGGTCTTTCATTGGTGTGATTT 16320 

ACGTGTAACGCGGATAATAATAAAGTAAATTACAAAAGAGAAACGCATAATTTTATTCCA 16380 

GAATGATTGCGAGAAACGATGAAAATACATGAAAATGCATATTGTCGCCAGGGAAGGATG 16440 

GCGCCGAAATAAACGAAACTGAGCCAATACAGTGACTTGCCAAGCGAGTTTGATCCTACC 16500 

AAATTCGCGCAAATTAATGCCCGTGTTCCATCGGGCCAGCGAGTTTATTCAAAAGAGTTT 16560 

CGTACACGTGGGCGGCGACGGCAACGTCAATGCTTGCTAGCCCTACCGGCGAGAAGTTGG 16620 

CCGGCCCCTTCCATGCCTTGAGGTCATTCATCAAGGCCTCGTCATCGAGAATTTCGGTGT 16680 

AGTTCTTGATCCCATCGCGCTTGCCGTGTTGGGTCAGTTTCATACCGCGCCTAGAATAGT 16740 

AGAGGGCAACGGCATCAACGTTGCGGGCTTCCATCGCAACAAGGTCATCGGCGACAATTA 16800 

GACCATCCGCAGATAGGACATGCTCAATGTAATCCGGCGGCATGTCATCAATACCGAGTG 16860 

ACAAAGTGACTGCGTTGGGGGCGATTTCAGCGGCTTCGAATACCGGTTTTCCGTAGTTGG 16920 

.TCGCCATGATGACGAATTGAGAATATGGCAAAAGGCTACGATCGCCGACAGCTTCAAGGC 16980 

TAAAGGTTACGCAATCACGTAACTTTTCGACGAGCTCGAAATTGGATTTCTTACCGCGGC 17040 

TGAGCACTGCTACCTTACGAATTCTCTTAGCGGCACCATAGTTAAGTGAGAGAATTACAG 17100 

CTTCGGCAACTTTTCCAGCCCCAAACAAGAAAACGTCGATGTCCTC7CTGCCTTGCAACA 17160 
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6CAGGTTTACGCAT6CTAGC6AGAACCAACCCGTTCTTCCATTAGAAATTGCCACGCCCT 17220 
CTACCGACATAAGGAGCGTCCCGGACACCTTGTCGCGCAGGAAAATATCGGAGTGCTGGA 17280 
GCGGCTTTCCGGTAGCGGCGTTGGTTGGCGCGAAGTGGATGTCTTTGGTGCCGGAATATC 17340 
TTCCGAAATAGCCAATGAGTGCTCCTTCAGTCCATCCAGGAACATTCTTGTTGAACGTTA 17400 
GGTAAGCTTTGACATGTCCGGCTTTTCCTGCGGCAAACACCTCCCAATAGGACTTGAGAG 17460 
CTTCGTCAACAAATGCTGGTGTGATCTGGATATCGAGGTTTGATAGTGCAGATTCAGTCC 17520 

AGTGTACCTCGCAAAGTTGTTTGGCCATCTGCCT7GTAGGTGCGAATTTTCTCTGCTCAA 17580 

ATTGTTGAGGTTAGCGGATTTGTAAACGCGTTTATATGGGCTGCTTGGAGGGTACTTTTG 17640 

GATTAATTTTTTTCT6CCAGCGCATTCTGACGCGGCACCGCTTTGGAAAGTGCGCTGTGG 1 7700 

^ GTCCGCGTTTTCTACAATAATGTGCCGATCCGGTCAGAAAGTATATGGATGAGTTGTGCC 17760 

AGCCTCACCAACGTGCTGCAGGCCCATCATGACTACTTCAATGTTAATGGGGGTAATGAA 17820 

TAAATAGGCGAAATTGGGTTCACGGTGGGCCCAGGGAATATAATATTGCCGCAGAGGTAG 17880 

TCGGATGCCAAGGCCCGCAACTAATAGTTCACGAACAAATTCATTGTAGTGGGCGGCCAA 17940 

CTCCAAAACCAATTGCCAGTTATTGTATTGCAATACATATATGAGTATTCGGATACAACT 18000 

AATTTCATTAAATAATATTTTAAGTGTGGACAGAATAGCGCCTAATAAATTTGCGAATGT 18060 ' 

TGTCCAATTGACGTTTTTATAGGTAACTCGATAAATCGTGCTTTTGTGATATTCTGATGC 18120 

GGACAATATACATTTAAACATAAAGATATAAGTTATTGAGGCATTTATGTATATTACAAT 18180 

AGTGGGGTACATTTTTCACAGATGCTGTCACCCATGAAATATTGGCAAAATACTCTTAAA 18240 

C • ATATGCAAGAAACTAAAGAGGATGCATGGGTTGGGCTGTAGGTACATGGATGCAAATGCT 18300 

GTTTTGCAATAAGTCATATAGTCTCGTCTGTTGAGTGAGGCCCATTCAATCAGCAAGTAG 18360 

GACTGAGGTGCATGATCGACATATTTTTGAACCAGAGTTTTGGCAAGTTTTTCATACAAA 18420 

TGCACGGCTACGGCCAAATCGTAGCTTGCAAGTCCAACTGCTGAAAAGTTAGCCGGCCCG 18480 

TTCCAAGAAATTAGCCTTTGCATAAGGACTGGATCGCGGAGAACTTCAGAGTAGTTCCTG 18540 

ATCCCATTGTCCCTGCCGTGTTTTGTTAGCTTTAAATGGCGTCTTGAATAGTGCAGCGCC 18600 

AACGAGTCGATAT7ACGTGTTTCCATCGCATCCATATCATCTGCCACCACGATGCCACTC 18660 

AGC1TCAACACGTGATCAAAATAG7CAGCTGGCAATTCGTCAATTCCAAGCGTCAATGTA 18720 
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AeGGCATTGTCTGTGATCTCCTTCATCTCAAAGACGGGCTTGTTTGAATTCGTCGCCGTA 18780 

ATTATGAACTTGGATTTGCTGAGATATGCTCGATTGTTAACAGCCTTGAGTGAAATCTTG 18840 

ACTTCCGGCTGAAGCCTTTGCACCAACTCATGGTTTGACTGGTTGCAGCGGCTGAGAATC 18900 

GCGATTCGTTGAATTCTTCCAGATGCTCCCGAATTGAGGGCGAGGATGATGGCCTCGGCA 18960 

ACTTTACCTGCTCCGAATAGGAAGACATTGATCTGGCTTCGGCCCTGCAATAGGAGATTC 19020 

AGGCATGCTAGTGCCAGCCAACCAGTTCTCCTCTCCGATATAGCCACCCCATCAACAGAG 19080 

AAGAGACGTCTACCTGTGAAACGATTGCGAAGCCAACGTCGATGTGAGAAGTCGGTTCTT 19140 

TGTATCTCGCGTTTGACGGATTAGAATGGATGCTTTTCACACCCGAATAGTCGCCGACGA 19200 

AACCCACCAGAGCTCCCTCCGTACAGCCCTCTCGATCAAGTGGAACGAAGACCTTGTTGT 19260 

GGCCGAGCCGCCCTTCAGCAAAGAGGTGCCAATAATCTTTCAAGGCATCCGCGACGAGTT 19320 

CCGGTGTAATGTATATTCCAAAAGCCGATAGAGATTCCTCTGTCCAACATTGCTCGTGTA 19380 

TTTGATCGGCCATGTTTGTGTTTGATCAGCCTCCTTTCGAAAATTTCTTGAGTTTCGAAT 19440 

AATTCTAAAATCGAAGGACGATTAATAGTGCCATACCAAGACAAGAAGGGTAGGTGGGCC 19500 

ATCAATCCACAAGCCTAGCACATTTTGCTGTCTGCTCATGCAAGGTATCCAATGGAAGCC 19560 

TGGATTGGTTAGCCGAACTTGGTGGGTTCAATTGGAGCGGGCAGGTCACTTTTTGTCTCT 19620 

CAAATAACTGAAACTAAGTTTTGTTATTTGGTATGTGTTTGTCTGTTCTGCCGAAGGTGC 19680 

CCGAATTTGCGCAAATTCCTTTCTAAAAAGGCTTACATCTAGCAAAAGGTGAGCCCTGTG 19740 

CATCCCAGCATTTGGACAAAGCGCGCCAATTCGGACAGCGACTGGCTGCGTTGGAGGCTC 19800 

GGATCTCAAAGAATAGAAAAGAGTTATGATCATGTTCAGAACCGCCAATTTTGTGCGGTA 19860 

TGAGCTCTTTGATGAAAGTAATGGTTTCAAAAAAGCAACATCGTGGGTGAAAGGTACCTA 19920 

CATATCTTCACAGACAATAACTACTGTTGCTGTTTGCTGATTGACTGACAGGATATATGT 19980 

TCCTGTCATGTTTGTTCAATTGTTCAATTGTTCAATTGTTCAATTGTTCAATTGTTAATG 20040 

TATAAGTTCGTGATGAAGGATGGTTGTTTTAAAAATAGTATGTTTGACTGAGGTTAAGTC 20100 

ACTCACGTTTTGCACATCGACGGACCGTAAGCATTCTTTCGGTAAGACCGAAGCTCGTCC 20160 

CAGATAA7AGGCCCCGTGGAGGGAGGCCTTGTATGGGCCGACCGATGGGCGTGCTGAGCC 20220 

GAGTACGGCGACGCCTGCGGCGATTGCGCGGGCGGCACTGCGCGCAGGGGCACGGG1TCA 20280 
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TAC6AGGACGAGCGTAAGGG6CATAGAGCTTTCCGCCCGTCGGGTTTCAGCCATATT6CT 20340 

TGATTGCGGCCGACTGGAATGCAGCCAGGTCGTGCTCGCCGGCGGCGCCTGGTCGAGCGG 20400 

CATGCTGCGCACCTCAGTGTTCGGCTTCCT CAGCTCACGGTTACTGTGTCGGCGCTAAGA 204 60 

ACCGAGGCCTTTGATGGCGGACCTGGCCTTTTCAATCAAGGGATGCTGATTTTCCATCCG 20520 

TAAGCGTCTCGACGGTGGTTACACCGTCGGCTTCGGCGCGACGATGAGAACCGAAATCGT 20580 

CAGCGATAGCTTTCGTTTTCTGTCGGATTATTTCCTCCTGATGCGAGAGGAATGGCTTTC 20640 

TATCCGGCTGCCGGCGGGTGCGCGCGTCCGGATTGACCCTCCCGTTCGTTGCTACTTGGC 20700 

TCGAGTGACGAAATAGCACGCCTGTGCCGCTGTATCATGTCCATCGGGCTCACAGGAGAT 20760 

TCGCTCGTAGCGCGTTGGTGTCACTCACCAACACGCGTCGTCGCACCAAATTGGGGAGGA 20820 

TGGTAGCGGAATCCTAAAATCCTAAAACCATACCGACGCGTCACGGCGCTCGTGACCCCT 20880 

GCGAGCGACGCGGCACTCTCTCACCTGATCCGTGCTGCGGTTGCTCAATACGCAATGAGC 20940 

ATTGTCACGGTTCTCAGGGTAAACGGCAATCTCTTCGTCATGCGGGCGTGGATGCTATCA 21000 

CCGTTAGAAAGGGCCTGCCCCCATGGTGGGTCTCTAAGGTTCAGTCTGAGAAGGGGCAGC 21060 

CAGAGCGGCACTGTTTGAAGAGCAGTCTGAACCGCTCAGATCGCTCGCATCGATGCTTGG 21 1 20 
GCGGCG 21126 
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